This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Mapping two integers to one, in a unique and deterministic way
I'm trying to create unique identificator for pair of two integers (Ruby) :
f(i1,i2) = f(i2, i1) = some_unique_value
So, i1+i2, i1*i2, i1^i2 -not unique as well as (i1>i2) ? "i1" + "i2" : "i2" + "i1".
I think following solution will be ok:
(i1>i2) ? "i1" + "_" + "i2" : "i2" + "_" + "i1"
but:
I have to save result in DB and index it. So I prefer it to be an integer and as small as it possible.
Is Zlib.crc32(f(i1,i2)) can guaranty uniqueness?
Thanks.
UPD:
Actually, I'm not sure the result MUST be integer. Maybe I can convert it to decimal:
(i1>i2) ? i1.i2 : i2.i1
?
What you're looking for is called a Pairing function.
The following illustration from the German wikipedia page clearly shows how it works:
Implemented in Ruby:
def cantor_pairing(n, m)
(n + m) * (n + m + 1) / 2 + m
end
(0..5).map do |n|
(0..5).map do |m|
cantor_pairing(n, m)
end
end
=> [[ 0, 2, 5, 9, 14, 20],
[ 1, 4, 8, 13, 19, 26],
[ 3, 7, 12, 18, 25, 33],
[ 6, 11, 17, 24, 32, 41],
[10, 16, 23, 31, 40, 50],
[15, 22, 30, 39, 49, 60]]
Note that you will need to store the result of this pairing in a datatype with as many bits as both your input numbers put together. (If both input numbers are 32-bit, you will need a 64-bit datatype to be able to store all possible combinations, obviously.)
No, Zlib.crc32(f(i1,i2)) is not unique for all integer values of i1 and i2.
If i1 and i2 are also 32bit numbers then there are many more combinations of them than can be stored in a 32bit number, which is returned by CRC32.
CRC32 is not unique, and wouldn't be good to use as a key. Assuming you know the maximum value of your integers i1 and i2:
unique_id = (max_i2+1)*i1 + i2
If your integers can be negative, or will never be below a certain positive integer, you'll need the max and min values:
(max_i2-min_i2+1) * (i1-min_i1) + (i2-min_i2)
This will give you the absolute smallest number possible to identify both integers.
Well, no 4-byte hash will be unique when its input is an arbitrary binary string of more than 4 bytes. Your strings are from a highly restricted symbol set, so collisions will be fewer, but "no, not unique".
There are two ways to use a smaller integer than the possible range of values for both of your integers:
Have a system that works despite occasional collisions
Check for collisions and use some sort of rehash
The obvious way to solve your problem with a 1:1 mapping requires that you know the maximum value of one of the integers. Just multiply one by the maximum value and add the other, or determine a power of two ceiling, shift one value accordingly, then OR in the other. Either way, every bit is reserved for one or the other of the integers. This may or may not meet your "as small as possible" requirement.
Your ###_### string is unique per pair; if you could just store that as a string you win.
Here's a better, more space efficient solution:. My answer on it here
Related
Given two lists, say A = [1, 3, 2, 7] and B = [2, 3, 6, 3]
Find set of all products that can be formed by multiplying a number in A with a number in B. (By set, I mean I do not want duplicates). I looking for the fastest running time possible. Hash functions are not allowed.
First approach would be brute force, where the we multiple every number from A with every number in B and if we find a product that is not already in the list, then add it to the list. Finding all possible products will cost O(n^2) and to verify if the product is already present in the list, it will cost me O(n^2). So the total comes to O(n^4).
I am looking to optimize this solution. First thing that comes to my mind is to remove duplicates in list B. In my example, I have 3 as a duplicate. I do not need to compute the product of all elements from A with the duplicate 3 again. But this doesn't still help reducing the overall runtime though.
I am guessing the fastest possible run time can be O(n^2) if all the numbers in A and B combined are unique AND prime. That way it is guaranteed that there will be no duplicates and I do not need to verify if my product is already present in the list. So I am thinking if we can pre-process our input list such that it will guarantee unique product values (One way to pre-process is to remove duplicates in list B like I mentioned above).
Is this possible in O(n^2) time and will it make a difference if I only care about the number of unique possible products instead of the actual products?
for i = 1 to A.length:
for j = 1 to B.length:
if (A[i] * B[j]) not already present in list: \\ takes O(n^2) time to verify this
Add (A[i] * B[j]) to list
end if
end for
end for
print list
Expected result for the above input: 2, 3, 6, 9, 18, 4, 12, 14, 21, 42
EDIT:
I can think of a O(n^2 log n) solution:
1) I generate all possible product values without worrying about duplicates \ This is O(n^2)
2) Sort these product values \ this will be O(n^2 log n) because we have n^2 numbers to sort
3) Remove the duplicates in linear time since the elements are now sorted
Use sets to eliminate duplicates.
A=[3, 6, 6, 8]
B=[7, 8, 56, 3, 2, 8]
setA = set(A)
setB = set(B)
prod=set() #empty set
[prod.add(i*j) for i in setA for j in setB]
print(prod)
{64, 448, 6, 168, 9, 42, 12, 16, 48, 18, 336, 21, 24, 56}
Complexity is O(n^2).
Another way is the following.
O(n^3) complexity
prod=[]
A=[1,2,2,3]
B=[5,6,6,7]
for i in A:
for j in B:
if prod==[]:
prod.append(i*j)
continue
for k in range(len(prod)):
if i*j < prod[k]:
prod.insert(k,i*j)
break
elif i*j == prod[k]:
break
if k==len(prod)-1:
prod.append(i*j)
print(prod)
Yet another way. This could be using hash functions internally.
from toolz import unique
A=[1,2,2,3]
B=[5,5,7,8]
print(list(unique([i*j for i in A for j in B])))
Okay so I have a huge array of unsorted elements of an unknown data type (all elements are of the same type, obviously, I just can't make assumptions as they could be numbers, strings, or any type of object that overloads the < and > operators. The only assumption I can make about those objects is that no two of them are the same, and comparing them (A < B) should give me which one should show up first if it was sorted. The "smallest" should be first.
I receive this unsorted array (type std::vector, but honestly it's more of an algorithm question so no language in particular is expected), a number of objects per "group" (groupSize), and the group number that the sender wants (groupNumber).
I'm supposed to return an array containing groupSize elements, or less if the group requested is the last one. (Examples: 17 results with groupSize of 5 would only return two of them if you ask for the fourth group. Also, the fourth group is group number 3 because it's a zero-indexed array)
Example:
Received Array: {1, 5, 8, 2, 19, -1, 6, 6.5, -14, 20}
Received pageSize: 3
Received pageNumber: 2
If the array was sorted, it would be: {-14, -1, 1, 2, 5, 6, 6.5, 8, 19, 20}
If it was split in groups of size 3: {{-14, -1, 1}, {2, 5, 6}, {6.5, 8, 19}, {20}}
I have to return the third group (pageNumber 2 in a 0-indexed array): {6.5, 8, 19}
The biggest problem is the fact that it needs to be lightning fast. I can't sort the array because it has to be faster than O(n log n).
I've tried several methods, but can never get under O(n log n).
I'm aware that I should be looking for a solution that doesn't fill up all the other groups, and skips a pretty big part of the steps shown in the example above, to create only the requested group before returning it, but I can't figure out a way to do that.
You can find the value of the smallest element s in the group in linear time using the standard C++ std::nth_element function (because you know it's index in the sorted array). You can find the largest element S in the group in the same way. After that, you need a linear pass to find all elements x such that s <= x <= S and return them. The total time complexity is O(n).
Note: this answer is not C++ specific. You just need an implementation of the k-th order statistics in linear time.
Suppose there was an array E of 2^n elements. For example:
E = [2, 3, 5, 7, 11, 13, 17, 19]
Unfortunately, someone has come along and scrambled the array. They took all elements whose index in binary is of the form 1XX, and added them into the elements at index 0XX (i.e. they did E[0] += E[1], E[2] += E[3], etc. Then they did the same thing for indexes like X1X into X0X, and for XX1 into XX0.
More specifically, they ran this pseudo-code over the array:
def scramble(e):
n = lg_2(len(e))
for p in range(n):
m = 1 << p
for i in range(len(e)):
if (i & m) != 0:
e[i - m] += e[i]
In terms of our example, this causes:
E_1 = [2+3, 3, 5+7, 7, 11+13, 13, 17+19, 19]
E_1 = [5, 3, 12, 7, 24, 13, 36, 19]
E_2 = [5+12, 3+7, 12, 7, 24+36, 13+19, 36, 19]
E_2 = [17, 10, 12, 7, 60, 32, 36, 19]
E_3 = [17+60, 10+32, 12+36, 7+19, 60, 32, 36, 19]
E_3 = [77, 42, 48, 26, 60, 32, 36, 19]
You're given the array after it's been scrambled (i.e. your input is E_3). Your goal is to recover the original first element of E, (i.e. the number 2).
One way to get the 2 back is undo all the scrambling. Run the scrambling code, but with the += replaced by a -=. However, doing that is very expensive. It takes n 2^n time. Is there a faster way?
Alternate Form
Stated another way, I give you an array S where the element at index i is the sum of all elements with an index j satisfying (j & i) == ifrom a list E. For example, S[101110] is E[101110] + E[111110] + E[101111] + E[111111]). How expensive is it to recover an element of E, given S?
The item at 111111... is easy, because S[111111...] = E[111111...], but S[000000...] depends on a all the elements from E in a non-uniform way so it seems to be harder to get back.
Extended
What if we don't just want to recover the original items, but want to recover sums of the original items that have match a mask that can specify must-be-1, no-constraint, and must-be-0? Is this harder?
Call the number of items in the array N, and the size of the bitmasks being used B so N = 2^B.
You can't do better than O(N).
The example solution in the question, which just runs the scrambling in reverse, takes O(N B) time. We can reduce that to O(N) by discarding items that won't contribute to the actual value we read at the end. This makes the unscrambling much simpler, actually: just iteratively subtract the last half of the array from the first half, then discard the last half, until you have one item left.
def unscrambleFirst(S):
while len(S) > 1:
h = len(S)/2
for i in range(h):
S = S[:h] - S[h:] #item-by-item subtraction
return S[0]
It's not possible to go faster than O(N). We can prove it with linear algebra.
The original array has N independent items, i.e. it is a vector with N degrees of freedom.
The scrambling operation only uses linear operations, and so is equivalent to multiplying that vector by a matrix. (The matrix is [[1, 1], [0, 1]] tiled inside of itself B times; it ends up looking like a Sierpinski triangle).
The scrambling operation matrix is invertible (that's why we can undo the scrambling).
Therefore the scrambled vector must still have N degrees of freedom.
But our O(N) solution is a linear combination of every element of the scrambled vector.
And since the elements of the scrambled vector must all be linearly independent for there to be N degrees of freedom in it, we can't rewrite the usage of any one element with usage of the others.
Therefore we can't change which items we rely on, and we know that we rely on all of them in one case so it must be all of them in all cases.
Hopefully that's clear enough. The scrambling distributes the first item in a way that requires you to look at every item to get it back.
I need to sort and unspecifed amount of numbers in Lua. For example if I have theese numbers 15,21,31,50,32,11,11. I need to lua to sort them so the first one is the biggest like this: 50,32,31,21,15,11,11.
What is the easiest way to do this? Remember it got to work with an unspecified amont of numbers. Thanks!
table.sort sorts a table in place. By default, it uses < to compare elements. To sort them with the bigger element before smaller element:
local t = {15, 21, 31, 50, 32, 11, 11}
table.sort(t, function(a, b) return a > b end)
The number of elements doesn't matter, as a table can hold as many elements as possible.
Ever since I started programming this has been something I have been curious about. But seems too complicated for me to even attempt.
I'd love to see a solution.
1, 2, 3, 4, 5 // returns 6 (n + 1)
10, 20, 30, 40, 50 //returns 60 (n + 10)
10, 17, 31, 59, 115 //returns 227 ((n * 2) - 3)
What you want to do is called polynomial interpolation. There are many methods (see http://en.wikipedia.org/wiki/Polynomial_interpolation ), but you have to have an upper bound U on the degree of the polynomial and at least U + 1 values.
If you have sequential values, then there is a simple algorithm.
Given a sequence x1, x2, x3, ..., let Delta(x) be the sequence of differences x2 - x1, x3 - x2, x4 - x3, ... . If you have consecutive values of a degree n polynomial, then the nth iterate of Delta is a constant sequence.
For example, the polynomial n^3:
1, 8, 27, 64, 125, 216, ...
7, 19, 37, 61, 91, ...
12, 18, 24, 30, ...
6, 6, 6, ...
To get the next value, fill in another 6 and then work backward.
6, 6, 6, 6 = 6, ...
12, 18, 24, 30, 36 = 30 + 6, ...
7, 19, 37, 61, 91, 127 = 91 + 36, ...
1, 8, 27, 64, 125, 216, 343 = 216 + 127, ...
The restriction on the number of values above ensures that your sequence never becomes empty while performing the differences.
Sorry to disappoint, but this isn't quite possible (in general), as there are an infinite number of sequences for any given k values. Maybe with certain constraints..
You can take a look at this Everything2 post, which points to Lagrange polynomial.
Formally there is no unique next value to a partial sequence. The problem as usually understood can be clearly stated as:
Assume that the partial sequence exhibited is just sufficient to constrain some generating rule, deduce the simplest possible rule and exhibit the next value generated.
The problem turns on the meaning of "simplest", and is thus not really good for algorithmatic solutions. It can be done if you confine the problem to a certain class of functional forms for the generating rule, but the details depend on what forms you are willing to accept.
The book Numerical Recipes has pages and pages of real practical algorithms to do this kind of stuff. It's well worth the read!
The first two cases are easy:
>>> seq1 = [1, 2, 3, 4, 5]
>>> seq2 = [10, 20, 30, 40, 50]
>>> def next(seq):
... m = (seq[1] - seq[0])/(1-0)
... b = seq[0] - m * 0
... return m*len(seq) + b
>>> next(seq1)
6
>>> next(seq2)
60
The third case would require solving for a non-linear function.
You can try to use extrapolation. It will help you to find formulas to describe a given sequence.
I am sorry, I can't tell you much more, since my mathematic education happened quite a while ago. But you should find more informations in good books.
That kind of number series are often part of "intelligence tests", which leads me to think in the terms of such an algorithm being something passing (at least part of) a Turing Test, which is something quite hard to accomplish.
I like the idea and sequence one and two would seem to me that this is possible, but then again you cannot generalize as the sequence could totally go off base. The answer is probably that you cannot generalize, what you can do is write an algorithm to perform a specific sequence knowing the (n+1) or (2n+2) etc...
One thing you may be able to do is take a difference between element i and element i+1 and element i+2.
for example, in your third example:
10 17 31 59 115
Difference between 17 and 10 is 7, and the difference between 31 and 17 is 14, and the difference between 59 and 31 is 28, and the diffeerence between 115 and 59 is 56.
So you note that it becomes the element i+1 = i + (7*2^n).
So 17 = 10 + (7*2^0)
And 31 = 17 + (7*2^1)
And so on...
For an arbitrary function it can't be done, but for a linear function like in each of your examples it's simple enough.
You have f(n+1) = a*f(n) + b, and the problem amounts to finding a and b.
Given at least three terms of the sequence, you can do this (you need three because you have three unknowns -- the starting point, a, and b). For instance, suppose you have f(0), f(1) and f(2).
We can solve the equations:
f(1) = a*f(0) + b
f(2) = a*f(1) + b
The solution for is:
a = (f(2)-f(1))/(f(1)-f(0))
b = f(1) - f(0)*(f(2)-f(1))/(f(1)-f(0))
(You'll want to separately solve the case where f(0) = f(1) to avoid division by zero.)
Once you have a and b, you can repeatedly apply the formula to your starting value to generate any term in the sequence.
One could also write a more general procedure that works when given any three points in the sequence (e.g. 4th, 7th, 23rd, or whatever) . . . this is just a simple example.
Again, though, we had to make some assumptions about what form our solution would have . . . in this case taking it to be linear as in your example. One could take it to be a more general polynomial, for instance, but in that case you need more terms of the sequence to find the solution, depending on the degree of the polynomial.
See also the chapter "To Seek Whence Comes a Sequence" from the book "Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought" by Douglas Hofstadter
http://portal.acm.org/citation.cfm?id=218753.218755&coll=GUIDE&dl=GUIDE&CFID=80584820&CFTOKEN=18842417