Can an ANN of 2 neurons solve XOR? - algorithm

I know that an artificial neural network (ANN) of 3 neurons in 2 layers can solve XOR
Input1----Neuron1\
\ / \
/ \ +------->Neuron3
/ \ /
Input2----Neuron2/
But to minify this ANN, can just 2 neurons (Neuron1 takes 2 inputs, Neuron2 take only 1 input) solve XOR?
Input1
\
\ Neuron1------->Neuron2
/
Input2/
The artificial neuron receives one or more inputs...
https://en.wikipedia.org/wiki/Artificial_neuron
Bias input '1' is assumed to be always there in both diagrams.
Side notes:
Single neuron can solve xor but with additional input x1*x2 or x1+x2
https://www.quora.com/Why-cant-the-XOR-problem-be-solved-by-a-one-layer-perceptron/answer/Razvan-Popovici/log
The ANN form in second diagram may solve XOR with additional input like above to Neuron1 or Neuron2?

No that's not possible, unless (maybe) you start using some rather strange, unusual activation functions.
Let's first ignore neuron 2, and pretend that neuron 1 is the output node. Let x0 denote the bias value (always x0 = 1), and x1 and x2 denote the input values of an example, let y denote the desired output, and let w1, w2, w3 denote the weights from the x's to neuron 1. With the XOR problem, we have the following four examples:
x0 = 1, x1 = 0, x2 = 0, y = 0
x0 = 1, x1 = 1, x2 = 0, y = 1
x0 = 1, x1 = 0, x2 = 1, y = 1
x0 = 1, x1 = 1, x2 = 1, y = 0
Let f(.) denote the activation function of neuron 1. Then, assuming we can somehow train our weights to solve the XOR problem, we have the following four equations:
f(w0 + x1*w1 + x2*w2) = f(w0) = 0
f(w0 + x1*w1 + x2*w2) = f(w0 + w1) = 1
f(w0 + x1*w1 + x2*w2) = f(w0 + w2) = 1
f(w0 + x1*w1 + x2*w2) = f(w0 + w1 + w2) = 0
Now, the main problem is that activation functions that are typically used (ReLUs, sigmoid, tanh, idendity function... maybe others) are nondecreasing. That means that if you give it a larger input, you also get a larger output: f(a + b) >= f(a) if b >= 0. If you look at the above four equations, you'll see this is a problem. Comparing the second and third equations to the first tell us that w1 and w2 need to be positive because they need to increase the output in comparison to f(w0). But, then the fourth equation won't work out because it will give an even greater output, instead of 0.
I think (but didn't actually try to verify, maybe I'm missing something) that it would be possible if you use an activation function that goes up first and then down again. Think of something like f(x) = -(x^2) with some extra term to shift it away from the origin. I don't think such activation functions are commonly used in neural networks. I suspect they'll behave less nicely when training, and are not plausible from a biological point of view (remember than neural networks are at least inspired by biology).
Now, in your question you also added an extra link from neuron 1 to neuron 2, which I ignored in the discussion above. The problem here is still the same though. The activation level in neuron 1 is always going to be higher than (or at least as high as) the second and third cases. Neuron 2 would typically again have a nondecreasing activation function, so would not be able to change this (unless you put a negative weight between the hidden neuron 1 and output neuron 2, in which case you flip the problem around and will predict too high a value for the first case)
EDIT: Note that this is related to Aaron's answer, which is essentially also about the problem of nondecreasing activation functions, just using more formal language. Give him an upvote too!

It's not possible.
Firstly, you need an equal number of inputs to the inputs of XOR. The smallest ANN capable of modelling any binary operation will contain two inputs. The second diagram only shows one input, one output.
Secondly, and this is probably the most direct refutation, the XOR function's output is not an additive or multiplicative relationship, but can be modelled using a combination of them. A neuron is generally modelled using functions like sigmoids or lines which have no stationary points, so one layer of neurons can roughly approximate an additive or multiplicative relationship.
What this means is that a minimum of two layers of processing are required to produce a XOR operation.
This question brings up an interesting topic of ANNs. They are well-suited to identifying fuzzy relationships, but tend to require at least as much network complexity as any mathematical process which would solve the problem with no fuzzy margin for error. Use ANNs where you need to identify something which looks mostly like what you are identifying, and use math where you need to know precisely whether something matches a set of concrete traits.
Understanding the distinction between ANN and mathematics opens up the possibility of combining the two in more powerful calculation pipelines, such as identifying possible circles in an image using ANN, using mathematics to pin down their precise origins, and using a second ANN to compare those origins to the configurations on known objects.

It is absolutely possible to solve the XOR problem with only two neurons.
Take a look at the model below.
This model solves the problem easily.
The first representing logic AND and the other logic OR. The value of +1.5 for the threshold of the hidden neuron insures that it will be turned on only when both input units are on. The value of +0.5 for the output neuron insures that it will turn on only when it receives a net positive input greater than +0.5. The weight of -2 from the hidden neuron to the output one insures that the output neuron will not come on when both input neurons are on (ref. 2).
ref. 1: Hazem M El-Bakry, Modular neural networks for solving high complexity problems (link)
ref. 2: D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representation by error backpropagation, Parallel distributed processing: Explorations in the Microstructures of Cognition, Vol. 1, Cambridge, MA: MIT Press, pp. 318-362, 1986.

Of course it is possible. But before solving XOR problem with two neurons I want to discuss on linearly separability. A problem is linearly separable if only one hyperplane can make the decision boundary. (Hyperplane is just a plane drawn to differentiate the classes. For an N dimensional problem i.e, a problem having N features as inputs the hyperplane will be an N-1 dimensional plane.) So for a 2 input XOR problem the hyperplane will be an one dimensional plane that is a "line".
Now coming to the question, XOR is not linearly separable. Hence we cannot directly solve XOR problem with two neurons. Following images show no matter how many ways we draw a line in 2D space we cannot differentiate one side's output with the other. For example for the first one (0,1) and (1,0) both inputs makes XOR to give 1. But for the input (1,1) the output is 0 but we cannot make it separated and unfortunately they are falling in the same side.
So here we have two options to solve it:
Using hidden layer. But it will increase the number of neurons more than two.
Another option is to increase the dimensions.
Let's have an illustration how increasing dimensions can solve this problem keeping the the number of neurons 2.
For an analogy we can think XOR as a subtraction of AND from OR like below,
If you notice the upper figure, the first neuron will mimic logical AND after passing "v=(-1.5)+(x1*1)+(x2*1)" to some activation function and the output will be considered as 0 or 1 depending on v is negative or positive respectively (I am not getting into the details...hope you got the point). And the same way the next neuron will mimic logical OR.
So for the first three cases of the truth table the AND neuron will remain turned off. But for the last one (actually where OR is different from XOR) the AND neuron will be turned on providing a big negative value to the OR neuron which will overwhelm the total summation to negative as it is big enough to make the summation a negative number. So finally activation function of the second neuron will interpret it as 0.
By this way we can make XOR with 2 neurons.
Following two figures are also the solutions to your questions which I have collected:

The problem can be split in two parts.
Part one
a b c
-------
0 0 0
0 1 1
1 0 0
1 1 0
Part two
a b d
-------
0 0 0
0 1 0
1 0 1
1 1 0
Part one can be solved with one neuron.
Part two can also be solved with one neuron.
part one and part two added together makes the XOR.
c = sigmoid(a * 6.0178 + b * -6.6000 + -2.9996)
d = sigmoid(a * -6.5906 + b *5.9016 + -3.1123 )
----------------------------------------------------------
sigmoid(0.0 * 6.0178 + 0 * -6.6000 + -2.9996)+ sigmoid(0.0 * -6.5906 + 0 *5.9016 + -3.1123 ) = 0.0900
sigmoid(1.0 * 6.0178 + 0 * -6.6000 + -2.9996)+ sigmoid(1.0 * -6.5906 + 0 *5.9016 + -3.1123 ) = 0.9534
sigmoid(0.0 * 6.0178 + 1 * -6.6000 + -2.9996)+ sigmoid(0.0 * -6.5906 + 1 *5.9016 + -3.1123 ) = 0.9422
sigmoid(1.0 * 6.0178 + 1 * -6.6000 + -2.9996)+ sigmoid(1.0 * -6.5906 + 1 *5.9016 + -3.1123 ) = 0.0489

Related

How to fix skew trapezoidal distribution sampling output sample size

I am trying to generate a skewed trapezoidal distribution using inverse transform sampling.
The inputs are the values where the ramps start and end (a, b, c, d) and the sample size.
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;
h=2/(d+c-a-b);
Then I calculate the ratio of the length of ramps and flat components to get sample size for each:
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
And then finally I get the histogram from the following code:
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,n1);
y2=linspace(quartile1,quartile2,n2);
y3=linspace(quartile2,1,n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
However the sampling of ramps and flat components are not equal, looks like this:
I fixed this by trial and error, by reducing the sample size of the ramps by half:
n1=0.5*firstramp*SampleSize; %sample size for first ramp
n3=0.5*secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
This made the distribution look like this:
However this makes the output sample less than what is given in input.
I've also tried different combinations of changing the sample sizes of ramps and flat.
This also works:
n1=0.75*firstramp*SampleSize; %sample size for first ramp
n3=0.75*secondramp*SampleSize; %sample size for second ramp
n2=1.5*flat*SampleSize;
It increases the output samples, but it's still not close.
Any help will be appreciated.
Full code:
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;%*1.33333333333333;
h=2/(d+c-a-b);
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,.75*n1);
y2=linspace(quartile1,quartile2,1.5*n2);
y3=linspace(quartile2,1,.75*n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
%end
I don't know Matlab so I was hoping somebody else would jump in on this, but since nobody did here goes.
If I'm reading your code correctly what you did is not an inversion. Inversion is 1-1, i.e., one uniform input produces one outcome. You seem to be using a technique known as the "composition method". In composition the overall distribution is comprised of component pieces, each of which is straightforward to generate. You choose which component to generate from based on their proportions/probabilities relative to the whole. For density functions, probability is found as the area under the density curve, so your first mistake was in sampling the components relative to the width of each component rather than using their areas. The correct sampling proportions are 2/13, 4/13, and 7/13 for what you designated the firstramp, flat, and secondramp components, respectively. A second mistake (which is relatively minor) was to assign exact sample sizes to each of the components. Having probability 2/13 does not mean that exactly 2*SampleSize/13 of your samples will be from the firstramp, it means that's the expected sample size for that component. The expected value of a random variate is not necessarily (or even likely to be) the outcome you will actually get.
In pseudocode, the composition approach would be
generate U ~ Uniform(0,1)
if U <= 2/13:
generate and return a value from firstramp
else if U <= 6/13:
generate and return a value from flat
else:
generate and return a value from secondramp
Note that since each of the generate options will use one or more uniforms, and choosing between the options requires a uniform U, this is not an inversion.
If you want an actual inversion, you need to quantify your density, integrate it to get the cumulative distribution function, then apply the inversion technique by setting F(X) = U and solving for X. Since your distribution is made of distinct components, both the density and cumulative density will be piecewise functions.
After deriving the height based on the requirement that the areas of the two triangles and the flat section must add up to 1, I came up with the following for your density:
| (x + 3) / 13 -3 <= x <= -1
|
f(x) = | 2 / 13 -1 <= x <= 1
|
| 2 * (8 - x) / 91 1 <= x <= 8
Integrating this and collecting terms produces the CDF:
| (x + 3)**2 / 26 -3 <= x <= -1
|
F(x) = | (2 + x) * 2 / 13 -1 <= x <= 1
|
| 6 / 13 + [49 - (x - 8)**2] / 91 1 <= x <= 8
Finally, determining the values of F(x) at the break points between the segments and applying inversion yields the following pseudocode algorithm:
generate U ~ Uniform(0,1)
if U <= 2 / 13:
return 2 * sqrt( (13 * U) / 2 ) - 3
else if U <= 6 / 13:
return (13 * U) / 2 - 2:
else:
return 8 - sqrt( 91 * (1 - U) )
Note that this is a true inversion. The outcome is determined by generating a single U, and transforming it in different ways depending on which range it falls in.

Square root calculation using continued fractions to n bits of precision

This is an unsolved problem from my past arbitrary-precision rational numbers C++ assignment.
For calculation, I used this expression from Wikipedia (a being the initial guess, r being its remainder):
I ended up, just by guessing from experiments, with this approach:
Use an integer square root function on the numerator/denominator, use that as the guess
Iterate the continued fraction until the binary length of the denominator was at least the target precision
This worked well enough to get me through the official tests, however, from my testing, the precision was too high (sometimes almost double) – i.e. the code was inefficient – and I had no proof it worked on any input (and hence no confidence in the code).
A simplified excerpt from the code (natural/rational store arbitrary length numbers, assume all operations return fractions in their simplest form):
rational sqrt(rational input, int precision) {
rational guess(isqrt(input.numerator), isqrt(input.denominator)); // a
rational remainder = input - power(guess, 2); // r
rational result = guess;
rational expansion;
while (result.denominator.size() <= precision) {
expansion = remainder / (2 * guess + expansion);
result = guess + expansion;
// Handle rational results
if (power(root, 2) == input) {
break;
}
}
return result;
}
Can it be done better? If so, how?
Square roots can easily and very accurately be calculated by the General Continued Fractions (GCF). Being general means it can have any positive number as the numerator in contrast to the Regular or Simple Continued Fractions (RCF) where the numerators are all 1s. In order to comprehend the answer as a whole, it is best to start from the beginning.
The method used to solve the square root of any positive number n by a GFC (a + x) whereas a being the integral and x being the continued fractional part, is;
n − a^2
√n = a + x ⇒ n = a^2 + 2ax + x^2 ⇒ n − a^2 = x(2a + x) ⇒ x = _______
2a + x
Right at this moment you have a GCF since x nicely gets placed at the denominator and once you replace x with it's definition you get an indefinitely extending definition of x. Regarding a, you are free to choose it among integers which are less than the √n. So if you want to find √11 then a can be chosen among 1, 2 or 3. However it's always better to chose the biggest one in order to be able to simplify the GCF into an RCF at the next stage.
Remember that x = (n − a^2) / (2a + x) and n = 11 and a = 3. Now if we write the first two terms then we may simplify the GCF to RCF with all numerators as 1.
2 2 divide both 1
x = _____ ⇒ _________ ⇒ numerator and ⇒ _________ = x
6 + x 6 + 2 denominator by 2 3 + 1
_____ _____
6 + x 6 + x
Accordingly our RCF for √11 is;
1 ___
√11 = 3 + x ⇒ 3 + _____________ = [3;3,6]
1
3 + _________
1
6 + _____
1
3 + _....
6
Notice the coefficient notation [3; 3, 6, 3, 6, ...] which in this particular case resembles an infinite array. This is how RCF's are expressed in coefficient notation, the first item being the a and the tail after ; are the RCF coefficients of x. These two are sufficient since we already know that in RCF all numerators are fixed to 1.
Coming back to your precision question. You now have √11 = 3 + x where x is your RCF as [3;3,6,3,6,3,6...]. Normally you can try by picking a depth and reducing from right like [3,3,6,3,6,3,6...].reduceRight((p,c) => c + 1/p) as it would be done in JS. Not a precise enough result.? Then try it again from another depth. This is in fact how it is descriped in the linked wikipedia topic as bottom up. However it would be much efficient to go from left to right (top to bottom) by calculating the intermediate convergents one after the other, at a single pass. Every next intermediate convergent yields a better precision for you to test and decide weather to stop or continue. When you reach to a coefficient sufficient enough just stop there. Having said that, once you reach to the desired coefficient you may still do some fine tuning by increasing or decreasing that coefficient. Decreasing the coefficients at even indices or increasing the ones at odd indices would decrease the convergent and vice versa.
So in order to be able to do a left to right (top to bottom) analysis there is a special rule as
n2/d2 = (xn * n1 + n0)/(xn * d1 + d0)
We need to know last two interim convergents (n0/d0 and n1/d1) along with the current coefficient xn in order to be able calculate the next convergent (n2/d2).
We will start with two initial convergents as Infinity (n0/d0 = 1/0) and the a that we've chosen above (Remember √n = a + x) which is 3 so (n1/d1 = 3/1). Knowing that the 3 before the semicolon is in fact a, our first xn is the 3 right after the semicolon in our coefficients array [3;»» 3 ««,6,3,6,3,6...].
After we calculate n2/d2 and do our test, if need be, for the next step we will shift our convergents to the left so that we have the last two ready to calculate the next convergent. n0/d0 <- n1/d1 <- n2/d2
Here i present the table for the n2/d2 = (xn * n1 + n0)/(xn * d1 + d0) rule.
n0/d0 n1/d1 xn index n2/d2 decimal val.
_____ ______ __ _____ ________ ____________
1/0 3/1 3 1 odd 10/3 3.33333333..
3/1 10/3 6 2 evn 63/19 3.31578947..
10/3 63/19 3 3 odd 199/60 3.31666666..
63/19 199/60 6 4 evn 1257/379 3.31662269..
. . . . . .
. . . . . .
So as you may notice we are very quickly approaching to √11 which is 3.31662479... Note that the odd indices overshoot and evens undershoot due to cascading reciprocals. Since √11 is an irrational this will continue convergining indefinitely up until we say enough.
Remember, as mentioned earlier, once you reach to the desired coefficient you may still do some fine tuning by increasing or decreasing that coefficient (xn). Decreasing the coefficients at even indices or increasing the ones at odd indices would decrease the convergent and vice versa.
The problem here is, not all √n can simply be turned into RCF by a simple division as shown above. For a more generalized way to generate RCF from any √n you may check a more recent answer of mine.

Compare two arrays of points [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I'm trying to find a way to find similarities in two arrays of different points. I drew circles around points that have similar patterns and I would like to do some kind of auto comparison in intervals of let's say 100 points and tell what coefficient of similarity is for that interval. As you can see it might not be perfectly aligned also so point-to-point comparison would not be a good solution also (I suppose). Patterns that are slightly misaligned could also mean that they are matching the pattern (but obviously with a smaller coefficient)
What similarity could mean (1 coefficient is a perfect match, 0 or less - is not a match at all):
Points 640 to 660 - Very similar (coefficient is ~0.8)
Points 670 to 690 - Quite similar (coefficient is ~0.5-~0.6)
Points 720 to 780 - Let's say quite similar (coefficient is ~0.5-~0.6)
Points 790 to 810 - Perfectly similar (coefficient is 1)
Coefficient is just my thoughts of how a final calculated result of comparing function could look like with given data.
I read many posts on SO but it didn't seem to solve my problem. I would appreciate your help a lot. Thank you
P.S. Perfect answer would be the one that provides pseudo code for function which could accept two data arrays as arguments (intervals of data) and return coefficient of similarity.
Click here to see original size of image
I also think High Performance Mark has basically given you the answer (cross-correlation). In my opinion, most of the other answers are only giving you half of what you need (i.e., dot product plus compare against some threshold). However, this won't consider a signal to be similar to a shifted version of itself. You'll want to compute this dot product N + M - 1 times, where N, M are the sizes of the arrays. For each iteration, compute the dot product between array 1 and a shifted version of array 2. The amount you shift array 2 increases by one each iteration. You can think of array 2 as a window you are passing over array 1. You'll want to start the loop with the last element of array 2 only overlapping the first element in array 1.
This loop will generate numbers for different amounts of shift, and what you do with that number is up to you. Maybe you compare it (or the absolute value of it) against a threshold that you define to consider two signals "similar".
Lastly, in many contexts, a signal is considered similar to a scaled (in the amplitude sense, not time-scaling) version of itself, so there must be a normalization step prior to computing the cross-correlation. This is usually done by scaling the elements of the array so that the dot product with itself equals 1. Just be careful to ensure this makes sense for your application numerically, i.e., integers don't scale very well to values between 0 and 1 :-)
i think HighPerformanceMarks's suggestion is the standard way of doing the job.
a computationally lightweight alternative measure might be a dot product.
split both arrays into the same predefined index intervals.
consider the array elements in each intervals as vector coordinates in high-dimensional space.
compute the dot product of both vectors.
the dot product will not be negative. if the two vectors are perpendicular in their vector space, the dot product will be 0 (in fact that's how 'perpendicular' is usually defined in higher dimensions), and it will attain its maximum for identical vectors.
if you accept the geometric notion of perpendicularity as a (dis)similarity measure, here you go.
caveat:
this is an ad hoc heuristic chosen for computational efficiency. i cannot tell you about mathematical/statistical properties of the process and separation properties - if you need rigorous analysis, however, you'll probably fare better with correlation theory anyway and should perhaps forward your question to math.stackexchange.com.
My Attempt:
Total_sum=0
1. For each index i in the range (m,n)
2. sum=0
3. k=Array1[i]*Array2[i]; t1=magnitude(Array1[i]); t2=magnitude(Array2[i]);
4. k=k/(t1*t2)
5. sum=sum+k
6. Total_sum=Total_sum+sum
Coefficient=Total_sum/(m-n)
If all values are equal, then sum would return 1 in each case and total_sum would return (m-n)*(1). Hence, when the same is divided by (m-n) we get the value as 1. If the graphs are exact opposites, we get -1 and for other variations a value between -1 and 1 is returned.
This is not so efficient when the y range or the x range is huge. But, I just wanted to give you an idea.
Another option would be to perform an extensive xnor.
1. For each index i in the range (m,n)
2. sum=1
3. k=Array1[i] xnor Array2[i];
4. k=k/((pow(2,number_of_bits))-1) //This will scale k down to a value between 0 and 1
5. sum=(sum+k)/2
Coefficient=sum
Is this helpful ?
You can define a distance metric for two vectors A and B of length N containing numbers in the interval [-1, 1] e.g. as
sum = 0
for i in 0 to 99:
d = (A[i] - B[i])^2 // this is in range 0 .. 4
sum = (sum / 4) / N // now in range 0 .. 1
This now returns distance 1 for vectors that are completely opposite (one is all 1, another all -1), and 0 for identical vectors.
You can translate this into your coefficient by
coeff = 1 - sum
However, this is a crude approach because it does not take into account the fact that there could be horizontal distortion or shift between the signals you want to compare, so let's look at some approaches for coping with that.
You can sort both your arrays (e.g. in ascending order) and then calculate the distance / coefficient. This returns more similarity than the original metric, and is agnostic towards permutations / shifts of the signal.
You can also calculate the differentials and calculate distance / coefficient for those, and then you can do that sorted also. Using differentials has the benefit that it eliminates vertical shifts. Sorted differentials eliminate horizontal shift but still recognize different shapes better than sorted original data points.
You can then e.g. average the different coefficients. Here more complete code. The routine below calculates coefficient for arrays A and B of given size, and takes d many differentials (recursively) first. If sorted is true, the final (differentiated) array is sorted.
procedure calc(A, B, size, d, sorted):
if (d > 0):
A' = new array[size - 1]
B' = new array[size - 1]
for i in 0 to size - 2:
A'[i] = (A[i + 1] - A[i]) / 2 // keep in range -1..1 by dividing by 2
B'[i] = (B[i + 1] - B[i]) / 2
return calc(A', B', size - 1, d - 1, sorted)
else:
if (sorted):
A = sort(A)
B = sort(B)
sum = 0
for i in 0 to size - 1:
sum = sum + (A[i] - B[i]) * (A[i] - B[i])
sum = (sum / 4) / size
return 1 - sum // return the coefficient
procedure similarity(A, B, size):
sum a = 0
a = a + calc(A, B, size, 0, false)
a = a + calc(A, B, size, 0, true)
a = a + calc(A, B, size, 1, false)
a = a + calc(A, B, size, 1, true)
return a / 4 // take average
For something completely different, you could also run Fourier transform using FFT and then take a distance metric on the returning spectra.

convert real number to radicals

Suppose I have a real number. I want to approximate it with something of the form a+sqrt(b) for integers a and b. But I don't know the values of a and b. Of course I would prefer to get a good approximation with small values of a and b. Let's leave it undefined for now what is meant by "good" and "small". Any sensible definitions of those terms will do.
Is there a sane way to find them? Something like the continued fraction algorithm for finding fractional approximations of decimals. For more on the fractions problem, see here.
EDIT: To clarify, it is an arbitrary real number. All I have are a bunch of its digits. So depending on how good of an approximation we want, a and b might or might not exist. Brute force is naturally not a particularly good algorithm. The best I can think of would be to start adding integers to my real, squaring the result, and seeing if I come close to an integer. Pretty much brute force, and not a particularly good algorithm. But if nothing better exists, that would itself be interesting to know.
EDIT: Obviously b has to be zero or positive. But a could be any integer.
No need for continued fractions; just calculate the square-root of all "small" values of b (up to whatever value you feel is still "small" enough), remove everything before the decimal point, and sort/store them all (along with the b that generated it).
Then when you need to approximate a real number, find the radical whose decimal-portion is closet to the real number's decimal-portion. This gives you b - choosing the correct a is then a simple matter of subtraction.
This is actually more of a math problem than a computer problem, but to answer the question I think you are right that you can use continued fractions. What you do is first represent the target number as a continued fraction. For example, if you want to approximate pi (3.14159265) then the CF is:
3: 7, 15, 1, 288, 1, 2, 1, 3, 1, 7, 4 ...
The next step is create a table of CFs for square roots, then you compare the values in the table to the fractional part of the target value (here: 7, 15, 1, 288, 1, 2, 1, 3, 1, 7, 4...). For example, let's say your table had square roots for 1-99 only. Then you would find the closest match would be sqrt(51) which has a CF of 7: 7,14 repeating. The 7,14 is the closest to pi's 7,15. Thus your answer would be:
sqrt(51)-4
As the closest approximation given a b < 100 which is off by 0.00016. If you allow larger b's then you could get a better approximation.
The advantage of using CFs is that it is faster than working in, say, doubles or using floating point. For example, in the above case you only have to compare two integers (7 and 15), and you can also use indexing to make finding the closest entry in the table very fast.
This can be done using mixed integer quadratic programming very efficiently (though there are no run-time guarantees as MIQP is NP-complete.)
Define:
d := the real number you wish to approximate
b, a := two integers such that a + sqrt(b) is as "close" to d as possible
r := (d - a)^2 - b, is the residual of the approximation
The goal is to minimize r. Setup your quadratic program as:
x := [ s b t ]
D := | 1 0 0 |
| 0 0 0 |
| 0 0 0 |
c := [0 -1 0]^T
with the constraint that s - t = f (where f is the fractional part of d)
and b,t are integers (s is not)
This is a convex (therefore optimally solvable) mixed integer quadratic program since D is positive semi-definite.
Once s,b,t are computed, simply derive the answer using b=b, s=d-a and t can be ignored.
Your problem may be NP-complete, it would be interesting to prove if so.
Some of the previous answers use methods that are of time or space complexity O(n), where n is the largest “small number” that will be accepted. By contrast, the following method is O(sqrt(n)) in time, and O(1) in space.
Suppose that positive real number r = x + y, where x=floor(r) and 0 ≤ y < 1. We want to approximate r by a number of the form a + √b. If x+y ≈ a+√b then x+y-a ≈ √b, so √b ≈ h+y for some integer offset h, and b ≈ (h+y)^2. To make b an integer, we want to minimize the fractional part of (h+y)^2 over all eligible h. There are at most √n eligible values of h. See following python code and sample output.
import math, random
def findb(y, rhi):
bestb = loerror = 1;
for r in range(2,rhi):
v = (r+y)**2
u = round(v)
err = abs(v-u)
if round(math.sqrt(u))**2 == u: continue
if err < loerror:
bestb, loerror = u, err
return bestb
#random.seed(123456) # set a seed if testing repetitively
f = [math.pi-3] + sorted([random.random() for i in range(24)])
print (' frac sqrt(b) error b')
for frac in f:
b = findb(frac, 12)
r = math.sqrt(b)
t = math.modf(r)[0] # Get fractional part of sqrt(b)
print ('{:9.5f} {:9.5f} {:11.7f} {:5.0f}'.format(frac, r, t-frac, b))
(Note 1: This code is in demo form; the parameters to findb() are y, the fractional part of r, and rhi, the square root of the largest small number. You may wish to change usage of parameters. Note 2: The
if round(math.sqrt(u))**2 == u: continue
line of code prevents findb() from returning perfect-square values of b, except for the value b=1, because no perfect square can improve upon the accuracy offered by b=1.)
Sample output follows. About a dozen lines have been elided in the middle. The first output line shows that this procedure yields b=51 to represent the fractional part of pi, which is the same value reported in some other answers.
frac sqrt(b) error b
0.14159 7.14143 -0.0001642 51
0.11975 4.12311 0.0033593 17
0.12230 4.12311 0.0008085 17
0.22150 9.21954 -0.0019586 85
0.22681 11.22497 -0.0018377 126
0.25946 2.23607 -0.0233893 5
0.30024 5.29150 -0.0087362 28
0.36772 8.36660 -0.0011170 70
0.42452 8.42615 0.0016309 71
...
0.93086 6.92820 -0.0026609 48
0.94677 8.94427 -0.0024960 80
0.96549 11.95826 -0.0072333 143
0.97693 11.95826 -0.0186723 143
With the following code added at the end of the program, the output shown below also appears. This shows closer approximations for the fractional part of pi.
frac, rhi = math.pi-3, 16
print (' frac sqrt(b) error b bMax')
while rhi < 1000:
b = findb(frac, rhi)
r = math.sqrt(b)
t = math.modf(r)[0] # Get fractional part of sqrt(b)
print ('{:11.7f} {:11.7f} {:13.9f} {:7.0f} {:7.0f}'.format(frac, r, t-frac, b,rhi**2))
rhi = 3*rhi/2
frac sqrt(b) error b bMax
0.1415927 7.1414284 -0.000164225 51 256
0.1415927 7.1414284 -0.000164225 51 576
0.1415927 7.1414284 -0.000164225 51 1296
0.1415927 7.1414284 -0.000164225 51 2916
0.1415927 7.1414284 -0.000164225 51 6561
0.1415927 120.1415831 -0.000009511 14434 14641
0.1415927 120.1415831 -0.000009511 14434 32761
0.1415927 233.1415879 -0.000004772 54355 73441
0.1415927 346.1415895 -0.000003127 119814 164836
0.1415927 572.1415909 -0.000001786 327346 370881
0.1415927 911.1415916 -0.000001023 830179 833569
I do not know if there is any kind of standard algorithm for this kind of problem, but it does intrigue me, so here is my attempt at developing an algorithm that finds the needed approximation.
Call the real number in question r. Then, first I assume that a can be negative, in that case we can reduce the problem and now only have to find a b such that the decimal part of sqrt(b) is a good approximation of the decimal part of r. Let us now write r as r = x.y with x being the integer and y the decimal part.
Now:
b = r^2
= (x.y)^2
= (x + .y)^2
= x^2 + 2 * x * .y + .y^2
= 2 * x * .y + .y^2 (mod 1)
We now only have to find an x such that 0 = .y^2 + 2 * x * .y (mod 1) (approximately).
Filling that x into the formulas above we get b and can then calculate a as a = r - b. (All of these calculations have to be carefully rounded of course.)
Now, for the time being I am not sure if there is a way to find this x without brute forcing it. But even then, one can simple use a simple loop to find an x good enough.
I am thinking of something like this(semi pseudo code):
max_diff_low = 0.01 // arbitrary accuracy
max_diff_high = 1 - max_diff_low
y = r % 1
v = y^2
addend = 2 * y
x = 0
while (v < max_diff_high && v > max_diff_low)
x++;
v = (v + addend) % 1
c = (x + y) ^ 2
b = round(c)
a = round(r - c)
Now, I think this algorithm is fairly efficient, while even allowing you to specify the wished accuracy of the approximation. One thing that could be done that would turn it into an O(1) algorithm is calculating all the x and putting them into a lookup table. If one only cares about the first three decimal digits of r(for example), the lookup table would only have 1000 values, which is only 4kb of memory(assuming that 32bit integers are used).
Hope this is helpful at all. If anyone finds anything wrong with the algorithm, please let me know in a comment and I will fix it.
EDIT:
Upon reflection I retract my claim of efficiency. There is in fact as far as I can tell no guarantee that the algorithm as outlined above will ever terminate, and even if it does, it might take a long time to find a very large x that solves the equation adequately.
One could maybe keep track of the best x found so far and relax the accuracy bounds over time to make sure the algorithm terminates quickly, at the possible cost of accuracy.
These problems are of course non-existent, if one simply pre-calculates a lookup table.

How to implement Random(a,b) with only Random(0,1)? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
how to get uniformed random between a, b by a known uniformed random function RANDOM(0,1)
In the book of Introduction to algorithms, there is an excise:
Describe an implementation of the procedure Random(a, b) that only makes calls to Random(0,1). What is the expected running time of your procedure, as a function of a and b? The probability of the result of Random(a,b) should be pure uniformly distributed, as Random(0,1)
For the Random function, the results are integers between a and b, inclusively. For e.g., Random(0,1) generates either 0 or 1; Random(a, b) generates a, a+1, a+2, ..., b
My solution is like this:
for i = 1 to b-a
r = a + Random(0,1)
return r
the running time is T=b-a
Is this correct? Are the results of my solutions uniformly distributed?
Thanks
What if my new solution is like this:
r = a
for i = 1 to b - a //including b-a
r += Random(0,1)
return r
If it is not correct, why r += Random(0,1) makes r not uniformly distributed?
Others have explained why your solution doesn't work. Here's the correct solution:
1) Find the smallest number, p, such that 2^p > b-a.
2) Perform the following algorithm:
r=0
for i = 1 to p
r = 2*r + Random(0,1)
3) If r is greater than b-a, go to step 2.
4) Your result is r+a
So let's try Random(1,3).
So b-a is 2.
2^1 = 2, so p will have to be 2 so that 2^p is greater than 2.
So we'll loop two times. Let's try all possible outputs:
00 -> r=0, 0 is not > 2, so we output 0+1 or 1.
01 -> r=1, 1 is not > 2, so we output 1+1 or 2.
10 -> r=2, 2 is not > 2, so we output 2+1 or 3.
11 -> r=3, 3 is > 2, so we repeat.
So 1/4 of the time, we output 1. 1/4 of the time we output 2. 1/4 of the time we output 3. And 1/4 of the time we have to repeat the algorithm a second time. Looks good.
Note that if you have to do this a lot, two optimizations are handy:
1) If you use the same range a lot, have a class that computes p once so you don't have to compute it each time.
2) Many CPUs have fast ways to perform step 1 that aren't exposed in high-level languages. For example, x86 CPUs have the BSR instruction.
No, it's not correct, that method will concentrate around (a+b)/2. It's a binomial distribution.
Are you sure that Random(0,1) produces integers? it would make more sense if it produced floating point values between 0 and 1. Then the solution would be an affine transformation, running time independent of a and b.
An idea I just had, in case it's about integer values: use bisection. At each step, you have a range low-high. If Random(0,1) returns 0, the next range is low-(low+high)/2, else (low+high)/2-high.
Details and complexity left to you, since it's homework.
That should create (approximately) a uniform distribution.
Edit: approximately is the important word there. Uniform if b-a+1 is a power of 2, not too far off if it's close, but not good enough generally. Ah, well it was a spontaneous idea, can't get them all right.
No, your solution isn't correct. This sum'll have binomial distribution.
However, you can generate a pure random sequence of 0, 1 and treat it as a binary number.
repeat
result = a
steps = ceiling(log(b - a))
for i = 0 to steps
result += (2 ^ i) * Random(0, 1)
until result <= b
KennyTM: my bad.
I read the other answers. For fun, here is another way to find the random number:
Allocate an array with b-a elements.
Set all the values to 1.
Iterate through the array. For each nonzero element, flip the coin, as it were. If it is came up 0, set the element to 0.
Whenever, after a complete iteration, you only have 1 element remaining, you have your random number: a+i where i is the index of the nonzero element (assuming we start indexing on 0). All numbers are then equally likely. (You would have to deal with the case where it's a tie, but I leave that as an exercise for you.)
This would have O(infinity) ... :)
On average, though, half the numbers would be eliminated, so it would have an average case running time of log_2 (b-a).
First of all I assume you are actually accumulating the result, not adding 0 or 1 to a on each step.
Using some probabilites you can prove that your solution is not uniformly distibuted. The chance that the resulting value r is (a+b)/2 is greatest. For instance if a is 0 and b is 7, the chance that you get a value 4 is (combination 4 of 7) divided by 2 raised to the power 7. The reason for that is that no matter which 4 out of the 7 values are 1 the result will still be 4.
The running time you estimate is correct.
Your solution's pseudocode should look like:
r=a
for i = 0 to b-a
r+=Random(0,1)
return r
As for uniform distribution, assuming that the random implementation this random number generator is based on is perfectly uniform the odds of getting 0 or 1 are 50%. Therefore getting the number you want is the result of that choice made over and over again.
So for a=1, b=5, there are 5 choices made.
The odds of getting 1 involves 5 decisions, all 0, the odds of that are 0.5^5 = 3.125%
The odds of getting 5 involves 5 decisions, all 1, the odds of that are 0.5^5 = 3.125%
As you can see from this, the distribution is not uniform -- the odds of any number should be 20%.
In the algorithm you created, it is really not equally distributed.
The result "r" will always be either "a" or "a+1". It will never go beyond that.
It should look something like this:
r=0;
for i=0 to b-a
r = a + r + Random(0,1)
return r;
By including "r" into your computation, you are including the "randomness" of all the previous "for" loop runs.

Resources