Mapping function for two integers - algorithm

SO,
The problem
I have two integers, which are in first case, positive, and in second case - just any integers. I need to create a map function F from them to some another integer value, which will be:
Result should be integer value. For first case (x>0, y>0), positive integer value
Symmetric. That means F(x, y) = F(y, x)
Unique. That means F(x0, y0) = F(x1, y1) <=> (x0 = x1 ^ y0 = y1) V (y0 = x1 ^ x0 = y1)
My approach
At first glance, for positive integers we could use expression like F(x, y) = x2 + y2, but that will fail - for example, 892 + 232 = 132 + 912 As for second (common) case - that's even more complicated.
Use-case
That may be useful when dealing with some things, which supposed to be order-independent and need to be unique. For example, if we want to find cartesian product of many arrays and we want result to be unique independent of order, i.e. <x,z,y> is equal to <x,y,z>. It may be done with:
function decartProductPair($one, $two, $unique=false)
{
$result = [];
for($i=0; $i<count($one); $i++)
{
for($j=0; $j<count($two); $j++)
{
if($unique)
{
if($i!=$j)
{
$result[$i*$i+$j*$j]=array_merge((array)$one[$i],(array)$two[$j]);
// ^
// |
// +----//this is the place where F(i,j) is needed
}
}
else
{
$result[]=array_merge((array)$one[$i], (array)$two[$j]);
}
}
}
return array_values($result);
}
Another use-case is to properly group sender and receiver in some SQL table, so that different senders/receivers will be differed while they should stay symmetric. Something like:
SELECT
COUNT(1) AS message_count,
sender,
receiver
FROM
test
GROUP BY
-- this is the place where F(sender, receiver) is needed:
sender*sender + receiver*receiver
(By posting samples I wanted to show that issue is certainly related to programming)
The question
As mentioned, the question is - what can be used as F? I want as simple F as it's possible. Keep in mind two cases:
Integer x>0, y>0. F(x,y) > 0
Any integer x, y and so any integer F(x,y) as a result
May be F isn't just an expression - but some algorithm to find desired result for any x,y (so tagging with algorithm too). However, expression is better because it's more like that it will be able to use that expression in SQL or PHP or whatever. Feel free to edit tagging because I'm not sure if two tags here is enough

Most simple solution: f(x,y) = x^5 + y^5
No positive integer is known which can be written as the sum of two fifth powers in more than one way.
As for now, this is unsolved math problem.

You need a MAX_INTEGER constant, and the result will need to hold MAX_INTEGER**2 (say: be a long, if both are int's). In that case, one such function is:
f(x,y) = min(x,y)*MAX_INTEGER + max(x,y)
But I propose a different solution: use a hash function (say md5) of the string resulting from the concatenation of str(min(x,y)), a separator (say ".") and str(max(x,y)). That is:
f(x,y) = md5(str(min(x,y)) + "." + str(max(x,y)))
It is not unique, but collisions are very rare, and probably OK for most use cases. If still worried about collisions, save the actualy {x,y} along with f(x,y), and check if collisions happened.

Sort input numbers and interleave their bits:
x = 5
y = 3
Step 1. Sorting: 3, 5
Step 2. Mixing bits: 11, 101 -> 1_1_, 1_0_1 -> 11011 = 27
So, F(3, 5) = 27

A compact representation is x*(x+3)/2 + y*(x+1) + (y*(y-1))/2, which comes from an arrangement like this:
x->
y 0 1 3 6 10 15
| 2 4 7 11 16
v 5 8 12 17
9 13 18
14 19
20

According to [Stackoverflow:mapping-two-integers-to-one-in-a-unique-and-deterministic-way][1], if we symmetrize the formula we would have the following:
(x + y) * (x + y + 1) / 2 + min(x, y)
This might just work. For
(x + y) * (x + y + 1) / 2 + x
is unique, then the first formula is also unique.
[1]: Mapping two integers to one, in a unique and deterministic way

Related

Algorithm to simplify boolean expressions

I want to simplify a very large boolean function of the form :
f(a1,a2,....,an)= (a1+a2+a5).(a2+a7+a11+a23+a34)......(a1+a3+an).
'.' means OR
'+' means AND
there may be 100 such terms ('.' with each other )
value of n may go upto 30.
Is there any feasible algorithm to simplify this?
NOTE: this is not a lab assignment, a small part of my project on rule generation by rough set where f is dissimilarity function.
The well-known ways to do this are:
if the number of variables is less than 5, use the Karnaugh Map Algorithm
if the number of variables is 5 or more, use the Quine McCluskey Algorithm
The second way is most commonly used on a computer. It's tabular and straight-forward. The first way is the best way to do by hand and is more fun, but you can't use it for anything more than 4 variables reliably.
The typical method is to use boolean algebra to reduce the statement to its simplest form.
If, for example, you have something like:
(A AND B) OR (A AND C)
you can convert it to a more simple form:
A AND (B OR C)
If you represent the a values as an int or long where a1 has value 2, a2 has value 4, a3 has value 8 etc.:
int a = (a1 ? 2^1 : 0) + (a2 ? 2^2 : 0) + (a3 ? 2^3 : 0) + ...;
(wasting a bit to keep it simple and ignoring the fact that you'd be better off with an a0 = 1)
And you do the same for all of the terms:
long[] terms = ...;
terms[0] = 2^0 + 2^3 + 2^5 // a1+a2+a5
terms[1] = 2^2 + 2^7 + 2^23 + 2^34 // (a2+a7+a11+a23+a34)
Then you can find the result:
foreach(var term in terms)
{
if (a & term == term) return true;
}
return false;
BUT this only works well for up to n=64. Above that it's messy.

random increasing sequence with O(1) access to any element?

I have an interesting math/CS problem. I need to sample from a possibly infinite random sequence of increasing values, X, with X(i) > X(i-1), with some distribution between them. You could think of this as the sum of a different sequence D of uniform random numbers in [0,d). This is easy to do if you start from the first one and go from there; you just add a random amount to the sum each time. But the catch is, I want to be able to get any element of the sequence in faster than O(n) time, ideally O(1), without storing the whole list. To be concrete, let's say I pick d=1, so one possibility for D (given a particular seed) and its associated X is:
D={.1, .5, .2, .9, .3, .3, .6 ...} // standard random sequence, elements in [0,1)
X={.1, .6, .8, 1.7, 2.0, 2.3, 2.9, ...} // increasing random values; partial sum of D
(I don't really care about D, I'm just showing one conceptual way to construct X, my sequence of interest.) Now I want to be able to compute the value of X[1] or X[1000] or X[1000000] equally fast, without storing all the values of X or D. Can anyone point me to some clever algorithm or a way to think about this?
(Yes, what I'm looking for is random access into a random sequence -- with two different meanings of random. Makes it hard to google for!)
Since D is pseudorandom, there’s a space-time tradeoff possible:
O(sqrt(n))-time retrievals using O(sqrt(n)) storage locations (or,
in general, O(n**alpha)-time retrievals using O(n**(1-alpha))
storage locations). Assume zero-based indexing and that
X[n] = D[0] + D[1] + ... + D[n-1]. Compute and store
Y[s] = X[s**2]
for all s**2 <= n in the range of interest. To look up X[n], let
s = floor(sqrt(n)) and return
Y[s] + D[s**2] + D[s**2+1] + ... + D[n-1].
EDIT: here's the start of an approach based on the following idea.
Let Dist(1) be the uniform distribution on [0, d) and let Dist(k) for k > 1 be the distribution of the sum of k independent samples from Dist(1). We need fast, deterministic methods to (i) pseudorandomly sample Dist(2**p) and (ii) given that X and Y are distributed as Dist(2**p), pseudorandomly sample X conditioned on the outcome of X + Y.
Now imagine that the D array constitutes the leaves of a complete binary tree of size 2**q. The values at interior nodes are the sums of the values at their two children. The naive way is to fill the D array directly, but then it takes a long time to compute the root entry. The way I'm proposing is to sample the root from Dist(2**q). Then, sample one child according to Dist(2**(q-1)) given the root's value. This determines the value of the other, since the sum is fixed. Work recursively down the tree. In this way, we look up tree values in time O(q).
Here's an implementation for Gaussian D. I'm not sure it's working properly.
import hashlib, math
def random_oracle(seed):
h = hashlib.sha512()
h.update(str(seed).encode())
x = 0.0
for b in h.digest():
x = ((x + b) / 256.0)
return x
def sample_gaussian(variance, seed):
u0 = random_oracle((2 * seed))
u1 = random_oracle(((2 * seed) + 1))
return (math.sqrt((((- 2.0) * variance) * math.log((1.0 - u0)))) * math.cos(((2.0 * math.pi) * u1)))
def sample_children(sum_outcome, sum_variance, seed):
difference_outcome = sample_gaussian(sum_variance, seed)
return (((sum_outcome + difference_outcome) / 2.0), ((sum_outcome - difference_outcome) / 2.0))
def sample_X(height, i):
assert (0 <= i <= (2 ** height))
total = 0.0
z = sample_gaussian((2 ** height), 0)
seed = 1
for j in range(height, 0, (- 1)):
(x, y) = sample_children(z, (2 ** j), seed)
assert (abs(((x + y) - z)) <= 1e-09)
seed *= 2
if (i >= (2 ** (j - 1))):
i -= (2 ** (j - 1))
total += x
z = y
seed += 1
else:
z = x
return total
def test(height):
X = [sample_X(height, i) for i in range(((2 ** height) + 1))]
D = [(X[(i + 1)] - X[i]) for i in range((2 ** height))]
mean = (sum(D) / len(D))
variance = (sum((((d - mean) ** 2) for d in D)) / (len(D) - 1))
print(mean, math.sqrt(variance))
D.sort()
with open('data', 'w') as f:
for d in D:
print(d, file=f)
if (__name__ == '__main__'):
test(10)
If you do not record the values in X, and if you do not remember the values in X you have previously generate, there is no way to guarantee that the elements in X you generate (on the fly) will be in increasing order. It furthermore seems like there is no way to avoid O(n) time worst-case per query, if you don't know how to quickly generate the CDF for the sum of the first m random variables in D for any choice of m.
If you want the ith value X(i) from a particular realization, I can't see how you could do this without generating the sequence up to i. Perhaps somebody else can come up with something clever.
Would you be willing to accept a value which is plausible in the sense that it has the same distribution as the X(i)'s you would observe across multiple realizations of the X process? If so, it should be pretty easy. X(i) will be asymptotically normally distributed with mean i/2 (since it's the sum of the Dk's for k=1,...,i, the D's are Uniform(0,1), and the expected value of a D is 1/2) and variance i/12 (since the variance of a D is 1/12 and the variance of a sum of independent random variables is the sum of their variances).
Because of the asymptotic aspect, I'd pick some threshold value for i to switch over from direct summing to using the normal. For example, if you use i = 12 as your threshold you would use actual summing of uniforms for values of i from 1 to 11, and generate a Normal(i/2, sqrt(i/12)) value for i >. That's an O(1) algorithm since the total work is bounded by your threshold, and the results produced will be distributionally representative of what you would see if you actually went through the summing.

Solving linear equations represented as a string

I'm given a string 2*x + 5 - (3*x-2)=x + 5 and I need to solve for x. My thought process is that I'd convert it to an expression tree, something like,
=
/ \
- +
/\ /\
+ - x 5
/\ /\
* 5 * 2
/\ /\
2 x 3 x
But how do I actually reduce the tree from here? Any other ideas?
You have to reduce it using axioms from algebra
a * (b + c) -> (a * b) + (a * c)
This is done by checking the types of each node in the pass tree. Once the thing is fully expanded into terms, you can then check they are actually linear, etc.
The values in the tree will be either variables or numbers. It isn't very neat to represent these as classes inheriting from some AbstractTreeNode class however, because cplusplus doesn't have multiple dispatch. So it is better to do it the 'c' way.
enum NodeType {
Number,
Variable,
Addition //to represent the + and *
}
struct Node {
NodeType type;
//union {char*, int, Node*[2]} //psuedo code, but you need
//something kind of like this for the
//variable name ("x") and numerical value
//and the children
}
Now you can query they types of a node and its children using switch case.
As I said earlier - c++ idiomatic code would use virtual functions but lack the necessary multiple dispatch to solve this cleanly. (You would need to store the type anyway)
Then you group terms, etc and solve the equation.
You can have rules to normalise the tree, for example
constant + variable -> variable + constant
Would put x always on the left of a term. Then x * 2 + x * 4 could be simplified more easily
var * constant + var * constant -> (sum of constants) * var
In your example...
First, simplify the '=' by moving the terms (as per the rule above)
The right hand side will be -1 * (x + 5), becoming -1 * x + -1 * 5. The left hand side will be harder - consider replacing a - b with a + -1 * b.
Eventually,
2x + 5 + -3x + 2 + -x + -5 = 0
Then you can group terms ever which way you want. (By scanning along, etc)
(2 + -3 + -1) x + 5 + 2 + -5 = 0
Sum them up and when you have mx + c, solve it.
Assuming you have a first order equation, check all the leaves on each side. On each side, have two bins: one to add up all the leaves containing a multiple of X and one for all the leaves containing a multiples of a constant. Either add to a bin or multiply each bin as you step up the tree along each branch from the leaves. You will end up with something that is conceptually like
a*x + b = c*x + d
At that point, you can just solve
x = (d - b) / (a - c)
Assuming the equation can reduce to f(x) = 0, and f(x) = a * x + b.
You can transform all the leaves in expression tree to f(x), for example : 2 -> 0 * x + 2, 3 * x -> 3 * x + 0, then you can do arithmetic operations of f(x) in expression tree. finally solve the equation f(x) = 0.
If the function is much more complicated than polynomial, you can do a binary search on x, and using the expression tree to calculate the left and right side of equation.

Is it possible to compute the minimum of three numbers by using two comparisons at the same time?

I've been trying to think up of some way that I could do two comparisons at the same time to find the greatest/least of three numbers. Arithmetic operations on them are considered "free" in this case.
That is to say, the classical way of finding the greater of two, and then comparing it to the third number isn't valid in this case because one comparison depends on the result of the other.
Is it possible to use two comparisons where this isn't the case? I was thinking maybe comparing the differences of the numbers somehow or their products or something, but came up with nothing.
Just to reemphasize, two comparisons are still done, just that neither comparison relies on the result of the other comparison.
Great answers so far, thanks guys
Ignoring the possibility of equal values ("ties"), there are 3! := 6 possible orderings of three items. If a comparison yields exactly one bit, then two comparisons can only encode 2*2 := 4 possible configurations. and 4 < 6. IOW: you cannot decide the order of three items using two fixed comparisons.
Using a truth table:
a b c|min|a<b a<c b<c| condition needed using only a<b and a<c
-+-+-+---+---+---+---+------------------
1 2 3| a | 1 1 1 | (ab==1 && ac==1)
1 3 2| a | 1 1 0 | ...
2 1 3| b | 0 1 1 | (ab==0 && ac==1)
3 1 2| b | 0 0 1 | (ab==0 && ac==0) <<--- (*)
2 3 1| c | 1 0 0 | (ab==1 && ac==0)
3 2 1| c | 0 0 0 | (ab==0 && ac==0) <<--- (*)
As you can see, you cannot distinguish the two cases marked by (*), when using only the a<b and a<c comparisons. (choosing another set of two comparisons will of course fail similarly, (by symmetry)).
But it is a pity: we fail to encode the three possible outcomes using only two bits. (yes, we could, but we'd need a third comparison, or choose the second comparison based on the outcome of the first)
I think it's possible (the following is for the min, according to the original form of the question):
B_lt_A = B < A
C_lt_min_A_B = C < (A + B - abs(A - B)) / 2
and then you combine these (I have to write it sequentially, but this is rather a 3-way switch):
if (C_lt_min_A_B) then C is the min
else if (B_lt_A) then B is the min
else A is the min
You might argue that the abs() implies a comparison, but that depends on the hardware. There is a trick to do it without comparison for integers. For IEEE 754 floating point it's just a matter of forcing the sign bit to zero.
Regarding (A + B - abs(A - B)) / 2: this is (A + B) / 2 - abs(A - B) / 2, i.e., the minimum of A and B is half the distance between A and B down from their midpoint. This can be applied again to yield min(A,B,C), but then you lose the identity of the minimum, i.e., you only know the value of the minimum, but not where it comes from.
One day we may find that parallelizing the 2 comparisons gives a better turnaround time, or even throughput, in some situation. Who knows, maybe for some vectorization, or for some MapReduce, or for something we don't know about yet.
If you were only talking integers, I think you can do it with zero comparisons using some math and a bit fiddle. Given three int values a, b, and c:
int d = ((a + b) - Abs(a - b)) / 2; // find d = min(a,b)
int e = ((d + c) - Abs(d - c)) / 2; // find min(d,c)
with Abs(x) implemented as
int Abs(int x) {
int mask = x >> 31;
return (x + mask) ^ mask;
}
Not extensively tested, so I may have missed something. Credit for the Abs bit twiddle goes to these sources
How to compute the integer absolute value
http://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
From Bit Twiddling Hacks
r = y ^ ((x ^ y) & -(x < y)); // min(x, y)
min = r ^ ((z ^ r) & -(z < r)); // min(z, r)
Two comparisons!
How about this to find the minimum:
If (b < a)
Swap(a, b)
If (c < a)
Swap(a, c)
Return a;
You can do this with zero comparisons in theory, assuming 2's complement number representation (and that right shifting a signed number preserves its sign).
min(a, b) = (a+b-abs(a-b))/2
abs(a) = (2*(a >> bit_depth)+1) * a
and then
min(a,b,c) = min(min(a,b),c)
This works because assuming a >> bit_depth gives 0 for positive numbers and -1 for negative numbers then 2*(a>>bit_depth)+1 gives 1 for positive numbers and -1 for negative numbers. This gives the signum function and we get abs(a) = signum(a) * a.
Then it's just a matter of the min(a,b) formula. This can be demonstrated by going through the two possibilities:
case min(a,b) = a:
min(a,b) = (a+b - -(a-b))/2
min(a,b) = (a+b+a-b)/2
min(a,b) = a
case min(a,b) = b:
min(a,b) = (a+b-(a-b))/2
min(a,b) = (a+b-a+b)/2
min(a,b) = b
So the formula for min(a,b) works.
The assumptions above only apply to the abs() function, if you can get a 0-comparison abs() function for your data type then you're good to go.
For example, IEEE754 floating point data has a sign bit as the top bit so the absolute value simply means clearing that bit. This means you can also use floating point numbers.
And then you can extend this to min of N numbers in 0 comparisons.
In practice though, it's hard to imagine this method will beat anything not intentionally slower. This is all about using less than 3 independent comparisons, not about making something faster than the straightforward implementation in practice.
if cos(1.5*atan2(sqrt(3)*(B-C), 2*A-B-C))>0 then
A is the max
else
if cos(1.5*atan2(sqrt(3)*(C-A), 2*B-C-A))>0 then
B is the max
else
C is the max

Dynamic programming idiom for combinations

Consider the problem in which you have a value of N and you need to calculate how many ways you can sum up to N dollars using [1,2,5,10,20,50,100] Dollar bills.
Consider the classic DP solution:
C = [1,2,5,10,20,50,100]
def comb(p):
if p==0:
return 1
c = 0
for x in C:
if x <= p:
c += comb(p-x)
return c
It does not take into effect the order of the summed parts. For example, comb(4) will yield 5 results: [1,1,1,1],[2,1,1],[1,2,1],[1,1,2],[2,2] whereas there are actually 3 results ([2,1,1],[1,2,1],[1,1,2] are all the same).
What is the DP idiom for calculating this problem? (non-elegant solutions such as generating all possible solutions and removing duplicates are not welcome)
Not sure about any DP idioms, but you could try using Generating Functions.
What we need to find is the coefficient of x^N in
(1 + x + x^2 + ...)(1+x^5 + x^10 + ...)(1+x^10 + x^20 + ...)...(1+x^100 + x^200 + ...)
(number of times 1 appears*1 + number of times 5 appears * 5 + ... )
Which is same as the reciprocal of
(1-x)(1-x^5)(1-x^10)(1-x^20)(1-x^50)(1-x^100).
You can now factorize each in terms of products of roots of unity, split the reciprocal in terms of Partial Fractions (which is a one time step) and find the coefficient of x^N in each (which will be of the form Polynomial/(x-w)) and add them up.
You could do some DP in calculating the roots of unity.
You should not go from begining each time, but at max from were you came from at each depth.
That mean that you have to pass two parameters, start and remaining total.
C = [1,5,10,20,50,100]
def comb(p,start=0):
if p==0:
return 1
c = 0
for i,x in enumerate(C[start:]):
if x <= p:
c += comb(p-x,i+start)
return c
or equivalent (it might be more readable)
C = [1,5,10,20,50,100]
def comb(p,start=0):
if p==0:
return 1
c = 0
for i in range(start,len(C)):
x=C[i]
if x <= p:
c += comb(p-x,i)
return c
Terminology: What you are looking for is the "integer partitions"
into prescibed parts (you should replace "combinations" in the title).
Ignoring the "dynamic programming" part of the question, a routine
for your problem is given in the first section of chapter 16
("Integer partitions", p.339ff) of the fxtbook, online at
http://www.jjj.de/fxt/#fxtbook

Resources