Seek for a bijective function maps sets to integers - algorithm

For any two sequences a, b, where a = [a1,a2,...,an] and b = [b1,b2,...,bn] (0<=ai, bi<=m), I want to find an integer function f that f(a) = f(b) if and only if a, b have the same elements, without concerning about their orders.For example, if a = [1,1,2,3], b = [2,1,3,1], c = [3,2,1,3], then f(a) = f(b), f(a) ≠ f(b).
I know there is a naive algorithm which first sort the sequence and then map it to an integer.
For example, after sorting, we have a = [1,1,2,3], b = [1,1,2,3], c = [1,2,3,3], and suppose that m = 9, using decimal conversion, we will finally have f(a) = f(b) = 1123 ≠ f(c) = 1233. But this will take O(nlog(n)) time using some kind of sorting algorithm (Don't use non comparison sorting algorithms).
Is there a better approach ? Something like hash? An O(n) algorithm?
Note that I also need the function easy to be inversed, which means that we can map an integer back to a sequence (or a set, more concisely).
Update: Forgive my poor description. Here both m, n can be very large (1 million or larger). And I also want the upper bound of f to be quite small, preferably O(m^n).

This works for sufficiently small values of m, and sufficiently small array sizes:
#include <stdio.h>
unsigned primes [] = { 2,3,5,7,11,13,17, 19, 23, 29};
unsigned value(unsigned array[], unsigned count);
int main(void)
{
unsigned one[] = { 1,2,2,3,5};
unsigned two[] = { 2,3,1,5,2};
unsigned val1, val2;
val1 = value(one, 5);
val2 = value(two, 5);
fprintf(stdout, "Val1=%u, Val2=%u\n", val1, val2 );
return 0;
}
unsigned value(unsigned array[], unsigned count)
{
unsigned val, idx;
val = 1;
for (idx = 0; idx < count; idx++) {
val *= primes [ array[idx]];
}
return val;
}
For an explanation, see my description here.

Wow, #wildplasser answer is actually super smart. To expand a bit:
Any number can be decomposed in a unique fashion in prime numbers (this is known as the fundamental theorem of arithmetic). His answer relies on that, by building a number for which the input array is a representation of prime factor decomposition. As multiplication is commutative, the exact order of the elements in the array is of no importance, but a given number is associated to one (and only one) sequence of elements.
His solution can be expanded for arbitrary size, e.g. in Python:
import operator
import itertools
import math
class primes(object):
def __init__(self):
self.primes = [2,3,5,7,11]
self.stream = itertools.count(13, 2)
def __getitem__(self, i):
sq = int(math.sqrt(i))
while i >= len(self.primes):
n = self.stream.next()
while any(n % p == 0 for p in self.primes if p <= sq):
n = self.stream.next()
self.primes.append(n)
return self.primes[i]
def prod(itr):
return reduce(operator.mul, itr, 1)
p = primes()
def hash(array):
return prod(p[i] for i in array)
With the expected results:
>>> hash([1,2,2,3,5])
6825
>>> hash([5,3,2,2,1])
6825
Here, 6825 = 3^1 x 5^2 x 7^1 x 13^1, as 3 is the '1' prime (0-indexed), 5 the '2', etc...
>>> 3**1 * 5**2 * 7**1 * 13**1
6825
Building the number itself is O(n) multiplications, as long as the final result remains in the domain of int that you are using (unfortunately, I suspect it might get out of hand quite quickly). Building the series of prime number with an Eratosthenes Sieve as I did is asymptotically O(N * log log N), where N is the m-th largest prime. As asymptotically, N ~ m log m, this gives an overall complexity of O(n + m * log m * loglog (m * log m))
Using a similar approach, instead of taking the prime number decomposition, we can also consider the array to be a representation of the decomposition of a number in a base. To be consistent, this base has to be larger than the larger number of similar elements (e.g. for [5, 3, 3, 2, 1], base must be > 2 because there are two 3). Being on the safe side, you can write:
def hash2(array):
n = len(array)
return sum(n**i for i in array)
>>> hash2([1,5,3,2,2])
8070
>>> hash2([2,1,5,2,3])
8070
You can improve this by computing the largest number of similar elements in the array first, but the hash2 function will be a real hash only when used with the same basis, so the prime number decomposition is probably safe if you work with arrays of varying length and composition, as it will ALWAYS return the same unique integer per bag of numbers.

Related

MinAbsSum task from codility

There already is a topic about this task, but I'd like to ask about my specific approach. The task is:
For a given array A of N integers and a sequence S of N integers from
the set {−1, 1}, we define val(A, S) as follows:
val(A, S) = |sum{ A[i]*S[i] for i = 0..N−1 }|
(Assume that the sum of zero elements equals zero.)
For a given array A, we are looking for such a sequence S that
minimizes val(A,S).
Write a function:
def solution(A)
that, given an array A of N integers, computes the minimum value of
val(A,S) from all possible values of val(A,S) for all possible
sequences S of N integers from the set {−1, 1}.
For example, given array:
A[0] = 1 A1 = 5 A[2] = 2 A[3] = -2 your function should
return 0, since for S = [−1, 1, −1, 1], val(A, S) = 0, which is the
minimum possible value.
Write an efficient algorithm for the following assumptions:
N is an integer within the range [0..20,000]; each element of array A
is an integer within the range [−100..100].
My approach is to iterate through the array, track all possibble solutions in a set and chose the smallest one. To limit the time complexity, I save only the results that are less or equal sum(abs(A)). My code is:
def solution(A):
if not len (A):
return 0
A = [abs(a) for a in A]
possible_results = set([A[0]])
limit = sum(A)
for a in A[1:]:
possible_so_far = set()
for val in possible_results:
if abs(val + a) <= limit:
possible_so_far.add(abs(val + a))
if abs(val - a) <= limit:
possible_so_far.add(abs(val - a))
possible_results = possible_so_far
return min(possible_results)
It passes all the correctness tests, but failes some performance due to the timeout. The detected time complexity is O(N**2 * max(abs(A))), but I don't understand where the square comes from. The main loop is O(N) and the size of the set is up to sum(A), so the final complexity should be O(N * sum(A)).

Generating random sublist from ordered list that maintains ordering

Consider a problem where a random sublist of k items, Y, must be selected from X, a list of n items, where the items in Y must appear in the same order as they do in X. The selected items in Y need not be distinct. One solution is this:
for i = 1 to k
A[i] = floor(rand * n) + 1
Y[i] = X[A[i]]
sort Y according to the ordering of A
However, this has running time O(k log k) due to the sort operation. To remove this it's tempting to
high_index = n
for i = 1 to k
index = floor(rand * high_index) + 1
Y[k - i + 1] = X[index]
high_index = index
But this gives a clear bias to the returned list due to the uniform index selection. It feels like a O(k) solution is attainable if the indices in the second solution were distributed non-uniformly. Does anyone know if this is the case, and if so what properties the distribution the marginal indices are drawn from has?
Unbiased O(n+k) solution is trivial, high-level pseudo code.
create an empty histogram of size n [initialized with all elements as zeros]
populate it with k uniformly distributed variables at range. (do k times histogram[inclusiveRand(1,n)]++)
iterate the initial list [A], while decreasing elements in the histogram and appending elements to the result list.
Explanation [edit]:
The idea is to chose k elements out of n at random, with uniform
distribution for each, and create a histogram out of it.
This histogram now contains for each index i, how many times A[i] will appear in the resulting Y list.
Now, iterate the list A in-order, and for each element i, insert A[i] into the resulting Y list histogram[i] times.
This guarantees you maintain the order because you insert elements in order, and "never go back".
It also guarantees unbiased solution since for each i,j,K: P(histogram[i]=K) = P(histogram[j]=K), so for each K, each element has the same probability to appear in the resulting list K times.
I believe it can be done in O(k) using the order statistics [X(i)] but I cannot figure it out though :\
By your first algorithm, it suffices to generate k uniform random samples of [0, 1) in sorted order.
Let X1, ..., Xk be these samples. Given that Xk = x, the conditional distribution of X1, ..., Xk-1 is k - 1 uniform random samples of [0, x) in sorted order, so it suffices to sample Xk and recurse.
What's the probability that Xk < x? Each of k independent samples of [0, 1) must be less than x, so the answer (the cumulative distribution function for Xk) is x^k. To sample according to the cdf, all we have to do is invert it on a uniform random sample of [0, 1): pow(random(), 1.0 / k).
Here's an (expected) O(k) algorithm I actually would consider implementing. The idea is to dump the samples into k bins, sort each bin, and concatenate. Here's some untested Python:
def samples(n, k):
bins = [[] for i in range(k)]
for i in range(k):
x = randrange(n)
bins[(x * k) // n].append(x)
result = []
for bin in bins:
bin.sort()
result.extend(bin)
return result
Why is this efficient in expectation? Let's suppose we use insertion sort on each bin (each bin has expected size O(1)!). On top of operations that are O(k), we're going to pay proportionally to the number of sum of the squares of the bin sizes, which is basically the number of collisions. Since the probability of two samples colliding is at most something like 4/k and we have O(k^2) pairs of samples, the expected number of collisions is O(k).
I suspect rather strongly that the O(k) guarantee can be made with high probability.
You can use counting sort to sort Y and thus make the sorting linear with respect to k. However for that you need one additional array of length n. If we assume you have already allocated that, you may execute the code you are asking for arbitrary many times with complexity O(k).
The idea is just as you describe, but I will use one more array cnt of size n that I assume is initialized to 0, and another "stack" st that I assume is empty.
for i = 1 to k
A[i] = floor(rand * n) + 1
cnt[A[i]]+=1
if cnt[A[i]] == 1 // Needed to be able to traverse the inserted elements faster
st.push(A[i])
for elem in st
for i = 0 to cnt[elem]
Y.add(X[elem])
for elem in st
cnt[elem] = 0
EDIT: as mentioned by oldboy what I state in the post is not true - I still have to sort st, which might be a bit better then the original proposition but not too much. So This approach will only be good if k is comparable to n and then we just iterate trough cnt linearly and construct Y this way. This way st is not needed:
for i = 1 to k
A[i] = floor(rand * n) + 1
cnt[A[i]]+=1
for i = 1 to k
for j = 0 to cnt[i]
Y.add(X[i])
cnt[i] =0
For the first index in Y, the distribution of indices in X is given by:
P(x; n, k) = binomial(n - x + k - 2, k - 1) / norm
where binomial denotes calculation of the binomial coefficient, and norm is a normalisation factor, equal to the total number of possible sublist configurations.
norm = binomial(n + k - 1, k)
So for k = 5 and n = 10 we have:
norm = 2002
P(x = 0) = 0.357, P(x <= 0) = 0.357
P(x = 1) = 0.245, P(x <= 1) = 0.604
P(x = 2) = 0.165, P(x <= 2) = 0.769
P(x = 3) = 0.105, P(x <= 3) = 0.874
P(x = 4) = 0.063, P(x <= 4) = 0.937
... (we can continue this up to x = 10)
We can sample the X index of the first item in Y from this distribution (call it x1). The distribution of the second index in Y can then be sampled in the same way with P(x; (n - x1), (k - 1)), and so on for all subsequent indices.
My feeling now is that the problem is not solvable in O(k), because in general we are unable to sample from the distribution described in constant time. If k = 2 then we can solve in constant time using the quadratic formula (because the probability function simplifies to 0.5(x^2 + x)) but I can't see a way to extend this to all k (my maths isn't great though).
The original list X has n items. There are 2**n possible sublists, since every item will or will not appear in the resulting sublist: each item adds a bit to the enumeration of the possible sublists. You could view this enumeration of a bitword of n bits.
Since your are only want sublists with k items, you are interested in bitwords with exactly k bits set.
A practical algorithm could pick (or pick not) the first element from X, and then recurse into the rightmost n-1 substring of X, taking into account the accumulated number of chosen items. Since the X list is processed in order, the Y list will also be in order.
The original list X has n items. There are 2**n possible sublists, since every item will or will not appear in a sublist: each item adds a bit to the enumeration of the possible sublists. You could view this enumeration of a bitword of n bits.
Since your are only want sublists with k items, you are interested in bitwords with exactly k bits set. A practical algorithm could pick (or pick not) the first element from X, and then recurse into the rightmost n-1 substring of X, taking into account the accumulated number of chosen items. Since the X list is processed in order, the Y list will also be in order.
#include <stdio.h>
#include <string.h>
unsigned pick_k_from_n(char target[], char src[], unsigned k, unsigned n, unsigned done);
unsigned pick_k_from_n(char target[], char src[]
, unsigned k, unsigned n, unsigned done)
{
unsigned count=0;
if (k>n) return 0;
if (k==0) {
target[done] = 0;
puts(target);
return 1;
}
if (n > 0) {
count += pick_k_from_n(target, src+1, k, n-1, done);
target[done] = *src;
count += pick_k_from_n(target, src+1, k-1, n-1, done+1);
}
return count;
}
int main(int argc, char **argv) {
char result[20];
char *domain = "OmgWtf!";
unsigned cnt ,len, want;
want = 3;
switch (argc) {
default:
case 3:
domain = argv[2];
case 2:
sscanf(argv[1], "%u", &want);
case 1:
break;
}
len = strlen(domain);
cnt = pick_k_from_n(result, domain, want, len, 0);
fprintf(stderr, "Count=%u\n", cnt);
return 0;
}
Removing the recursion is left as an exercise to the reader.
Some output:
plasser#pisbak:~/hiero/src$ ./a.out 3 ABBA
BBA
ABA
ABA
ABB
Count=4
plasser#pisbak:~/hiero/src$

Integer distance

As a single operation between two positive integers we understand
multiplying one of the numbers by some prime number or dividing it by
such (provided it can be divided by this prime number without
the remainder). The distance between a and b denoted as d(a,b) is a
minimal amount of operations needed to transform number a into number
b. For example, d(69,42)=3.
Keep in mind that our function d indeed has characteristics of the
distance - for any positive ints a, b and c we get:
a) d(a,a)==0
b) d(a,b)==d(b,a)
c) the inequality of a triangle d(a,b)+d(b,c)>=d(a,c) is fulfilled.
You'll be given a sequence of positive ints a_1, a_2,...,a_n. For every a_i of them
output such a_j (j!=i) that d(a_i, a_j) is as low as possible. For example, the sequence of length 6: {1,2,3,4,5,6} should output {2,1,1,2,1,2}.
This seems really hard to me. What I think would be useful is:
a) if a_i is prime, we are unable to make anything less than a_i (unless it's 1) so the only operation allowed is multiplication. Therefore, if we have 1 in our set, for every prime number d(this_number, 1) is the lowest.
b) also, for 1 d(1, any_prime_number) is the lowest.
c) for a non-prime number we check if we have any of its factors in our set or multiplication of its factors
That's all I can deduce, though. The worst part is I know it will take an eternity for such an algorithm to run and check all the possibilities... Could you please try to help me with it? How should this be done?
Indeed, you can represent any number N as 2^n1 * 3^n2 * 5^n3 * 7^n4 * ... (most of the n's are zeroes).
This way you set a correspondence between a number N and infinite sequence (n1, n2, n3, ...).
Note that your operation is just adding or subtracting 1 at exactly one of the appropriate sequence's places.
Let N and M be two numbers, and their sequences be (n1, n2, n3, ...) and (m1, m2, m3, ...).
The distance between the two numbers is indeed nothing but |n1 - m1| + |n2 - m2| + ...
So, in order to find out the closest number, you need to calculate the sequences for all the input numbers (this is just decomposing them into primes). Having this decomposition, the calculation is straightforward.
Edit:
In fact, you don't need the exact position of your prime factor: you just need to know, which is the exponent for each of the prime divisors.
Edit:
this is the simple procedure for converting the number into the chain representation:
#include <map>
typedef std::map<unsigned int, unsigned int> ChainRepresentation;
// maps prime factor -> exponent, default exponent is of course 0
void convertToListRepresentation(int n, ChainRepresentation& r)
{
// find a divisor
int d = 2;
while (n > 1)
{
for (; n % d; d++)
{
if (n/d < d) // n is prime
{
r[n]++;
return;
}
}
r[d]++;
n /= d;
}
}
Edit:
... and the code for distance:
#include <set>
unsigned int chainDistance(ChainRepresentation& c1, ChainRepresentation& c2)
{
if (&c1 == &c2)
return 0; // protect from modification done by [] during self-comparison
int result = 0;
std::set<unsigned int> visited;
for (ChainRepresentation::const_iterator it = c1.begin(); it != c1.end(); ++it)
{
unsigned int factor = it->first;
unsigned int exponent = it->second;
unsigned int exponent2 = c2[factor];
unsigned int expabsdiff = (exponent > exponent2) ?
exponent - exponent2 : exponent2 - exponent;
result += expabsdiff;
visited.insert(factor);
}
for (ChainRepresentation::const_iterator it = c2.begin(); it != c2.end(); ++it)
{
unsigned int factor = it->first;
if (visited.find(factor) != visited.end())
continue;
unsigned int exponent2 = it->second;
// unsigned int exponent = 0;
result += exponent2;
}
return result;
}
For the given limits: 100_000 numbers not greater than a million the most-straightforward algorithm works (1e10 calls to distance()):
For each number in the sequence print its closest neighbor (as defined by minimal distance):
solution = []
for i, ai in enumerate(numbers):
all_except_i = (aj for j, aj in enumerate(numbers) if j != i)
solution.append(min(all_except_i, key=lambda x: distance(x, ai)))
print(', '.join(map(str, solution)))
Where distance() can be calculated as (see #Vlad's explanation):
def distance(a, b):
"""
a = p1**n1 * p2**n2 * p3**n3 ...
b = p1**m1 * p2**m2 * p3**m3 ...
distance = |m1-n1| + |m2-n2| + |m3-n3| ...
"""
diff = Counter(prime_factors(b))
diff.subtract(prime_factors(a))
return sum(abs(d) for d in diff.values())
Where prime_factors() returns prime factors of a number with corresponding multiplicities {p1: n1, p2: n2, ...}:
uniq_primes_factors = dict(islice(prime_factors_gen(), max(numbers)))
def prime_factors(n):
return dict(multiplicities(n, uniq_primes_factors[n]))
Where multiplicities() function given n and its factors returns them with their corresponding multiplicities (how many times a factor divides the number without a remainder):
def multiplicities(n, factors):
assert n > 0
for prime in factors:
alpha = 0 # multiplicity of `prime` in `n`
q, r = divmod(n, prime)
while r == 0: # `prime` is a factor of `n`
n = q
alpha += 1
q, r = divmod(n, prime)
yield prime, alpha
prime_factors_gen() yields prime factors for each natural number. It uses Sieve of Eratosthenes algorithm to find prime numbers. The implementation is based on gen_primes() function by #Eli Bendersky:
def prime_factors_gen():
"""Yield prime factors for each natural number."""
D = defaultdict(list) # nonprime -> prime factors of `nonprime`
D[1] = [] # `1` has no prime factors
for q in count(1): # Sieve of Eratosthenes algorithm
if q not in D: # `q` is a prime number
D[q + q] = [q]
yield q, [q]
else: # q is a composite
for p in D[q]: # `p` is a factor of `q`: `q == m*p`
# therefore `p` is a factor of `p + q == p + m*p` too
D[p + q].append(p)
yield q, D[q]
del D[q]
See full example in Python.
Output
2, 1, 1, 2, 1, 2
Without bounds on how large your numbers can be and how many numbers can be on the input, we can't really deduce it will take "an eternity" to complete. I am tempted to suggest the most "obvious" solution I can think of
Given the factorization of the numbers it is very easy to find their distance
60 = (2^2)*(3^1)*(5^1)*(7^0)
42 = (2^1)*(3^1)*(5^0)*(7^1)
distance = 3
Calculating this factorization using the naive trial division should take at most O(sqrt(N)) time per number, where N is the number being factorized.
Given the factorizations, you only have O(n^2) combinations to worry about, where n is the ammount of numbers. If you store all the factorizations so that you only compute them once, this step shouldn't take that long unless you have a really large amount of numbers.
You do wonder if there is a faster algorithm though. Perhaps it is possible to do some greatest common divisor trick to avoid computing large factorizations and perhaps we can use some graph algorithms to find the distances in a smarter way.
Haven't really thought this through, but it seems to me that to get from prime A to prime B you multiply A * B and then divide by A.
If you thus break the initial non-prime A & B into their prime factors, factor out the common prime factors, and then use the technique in the first paragraph to convert the unique primes, you should be following a minimal path to get from A to B.

Finding the closest integer fraction to a given random real between 0..1, given ranges of numerator and denominator

Given two ranges of positive integers x: [1 ... n] and y: [1 ... m] and random real R from 0 to 1, I need to find the pair of elements (i,j) from x and y such that x_i / y_j is closest to R.
What is the most efficient way to find this pair?
Using Farey sequence
This is a simple and mathematically beautiful algorithm to solve this: run a binary search, where on each iteration the next number is given by the mediant formula (below). By the properties of the Farey sequence that number is the one with the smallest denominator within that interval. Consequently this sequence will always converge and never 'miss' a valid solution.
In pseudocode:
input: m, n, R
a_num = 0, a_denom = 1
b_num = 1, b_denom = 1
repeat:
-- interestingly c_num/c_denom is already in reduced form
c_num = a_num + b_num
c_denom = a_denom + b_denom
-- if the numbers are too big, return the closest of a and b
if c_num > n or c_denom > m then
if R - a_num/a_denom < b_num/b_denom - R then
return a_num, a_denom
else
return b_num, b_denom
-- adjust the interval:
if c_num/c_denom < R then
a_num = c_num, a_denom = c_denom
else
b_num = c_num, b_denom = c_denom
goto repeat
Even though it's fast on average (my educated guess that it's O(log max(m,n))), it can still be slow if R is close to a fraction with a small denominator. For example finding an approximation to 1/1000000 with m = n = 1000000 will take a million iterations.
The standard approach to approximating reals with rationals is computing the continued fraction series (see [1]). Put a limit on the nominator and denominator while computing parts of the series, and the last value before you break the limits is a fraction very close to your real number.
This will find a very good approximation very fast, but I'm not sure this will always find a closest approximation. It is known that
any convergent [partial value of the continued fraction expansion] is nearer to the continued fraction than any other fraction whose denominator is less than that of the convergent
but there may be approximations with larger denominator (still below your limit) that are better approximations, but are not convergents.
[1] http://en.wikipedia.org/wiki/Continued_fraction
Given that R is a real number such that 0 <= R <= 1, integers x: [1 ... n] and integers y: [1 ... m]. It is assumed that n <= m, since if n > m then x[n]/y[m] will be greater than 1, which cannot be the closest approximation to R.
Therefore, the best approximation of R with the denominator d will be either floor(R*d) / d or ceil(R*d) / d.
The problem can be solved in O(m) time and O(1) space (in Python):
from __future__ import division
from random import random
from math import floor
def fractionize(R, n, d):
error = abs(n/d - R)
return (n, d, error) # (numerator, denominator, absolute difference to R)
def better(a, b):
return a if a[2] < b[2] else b
def approximate(R, n, m):
best = (0, 1, R)
for d in xrange(1, m+1):
n1 = min(n, int(floor(R * d)))
n2 = min(n, n1 + 1) # ceil(R*d)
best = better(best, fractionize(R, n1, d))
best = better(best, fractionize(R, n2, d))
return best
if __name__ == '__main__':
def main():
R = random()
n = 30
m = 100
print R, approximate(R, n, m)
main()
Prolly get flamed, but a lookup might be best where we compute all of the fractional values for each of the possible values.. So a simply indexing a 2d array indexed via the fractional parts with the array element containing the real equivalent. I guess we have discrete X and Y parts so this is finite, it wouldnt be the other way around.... Ahh yeah, the actual searching part....erm reet....
Rather than a completely brute force search, do a linear search over the shortest of your lists, using round to find the best match for each element. Maybe something like this:
best_x,best_y=(1,1)
for x in 1...n:
y=max(1,min(m,round(x/R)))
#optional optimization (if you have a fast gcd)
if gcd(x,y)>1:
continue
if abs(R-x/y)<abs(R-bestx/besty):
best_x,best_y=(x,y)
return (best_x,best_y)
Not at all sure whether the gcd "optimization" will ever be faster...
The Solution:
You can do this O(1) space and O(m log(n)) time:
there is no need to create any list to search,
The pseudo code may be is buggy but the idea is this:
r: input number to search.
n,m: the ranges.
for (int i=1;i<=m;i++)
{
minVal = min(Search(i,1,n,r), minVal);
}
//x and y are start and end of array:
decimal Search(i,x,y,r)
{
if (i/x > r)
return i/x - r;
decimal middle1 = i/Cill((x+y)/2);
decimal middle2 = i/Roof((x+y)/2);
decimal dist = min(middle1,middle2)
decimal searchResult = 100000;
if( middle > r)
searchResult = Search (i, x, cill((x+y)/2),r)
else
searchResult = Search(i, roof((x+y)/2), y,r)
if (searchResult < dist)
dist = searchResult;
return dist;
}
finding the index as home work to reader.
Description: I think you can understand what's the idea by code, but let trace one of a for loop:
when i=1:
you should search within bellow numbers:
1,1/2,1/3,1/4,....,1/n
you check the number with (1,1/cill(n/2)) and (1/floor(n/2), 1/n) and doing similar binary search on it to find the smallest one.
Should do this for loop for all items, so it will be done m time. and in each time it takes O(log(n)). this function can improve by some mathematical rules, but It will be complicated, I skip it.
If the denominator of R is larger than m then use the Farey method (which the Fraction.limit_denominator method implements) with a limit of m to get a fraction a/b where b is smaller than m else let a/b = R. With b <= m, either a <= n and you are done or else let M = math.ceil(n/R) and re-run the Farey method.
def approx2(a, b, n, m):
from math import ceil
from fractions import Fraction
R = Fraction(a, b)
if R < Fraction(1, m):
return 1, m
r = R.limit_denominator(m)
if r.numerator > n:
M = ceil(n/R)
r = R.limit_denominator(M)
return r.numerator, r.denominator
>>> approx2(113, 205, 50, 200)
(43, 78)
It might be possible to just run the Farey method once using a limiting denominator of min(ceil(n/R), m) but I am not sure about that:
def approx(a, b, n, m):
from math import ceil
from fractions import Fraction
R = Fraction(a, b)
if R < Fraction(1, m):
return 1, m
r = R.limit_denominator(min(ceil(n/R), m))
return r.numerator, r.denominator

Generating shuffled range using a PRNG rather than shuffling

Is there any known algorithm that can generate a shuffled range [0..n) in linear time and constant space (when output produced iteratively), given an arbitrary seed value?
Assume n may be large, e.g. in the many millions, so a requirement to potentially produce every possible permutation is not required, not least because it's infeasible (the seed value space would need to be huge). This is also the reason for a requirement of constant space. (So, I'm specifically not looking for an array-shuffling algorithm, as that requires that the range is stored in an array of length n, and so would use linear space.)
I'm aware of question 162606, but it doesn't present an answer to this particular question - the mappings from permutation indexes to permutations given in that question would require a huge seed value space.
Ideally, it would act like a LCG with a period and range of n, but the art of selecting a and c for an LCG is subtle. Simply satisfying the constraints for a and c in a full period LCG may satisfy my requirements, but I am wondering if there are any better ideas out there.
Based on Jason's answer, I've made a simple straightforward implementation in C#. Find the next largest power of two greater than N. This makes it trivial to generate a and c, since c needs to be relatively prime (meaning it can't be divisible by 2, aka odd), and (a-1) needs to be divisible by 2, and (a-1) needs to be divisible by 4. Statistically, it should take 1-2 congruences to generate the next number (since 2N >= M >= N).
class Program
{
IEnumerable<int> GenerateSequence(int N)
{
Random r = new Random();
int M = NextLargestPowerOfTwo(N);
int c = r.Next(M / 2) * 2 + 1; // make c any odd number between 0 and M
int a = r.Next(M / 4) * 4 + 1; // M = 2^m, so make (a-1) divisible by all prime factors, and 4
int start = r.Next(M);
int x = start;
do
{
x = (a * x + c) % M;
if (x < N)
yield return x;
} while (x != start);
}
int NextLargestPowerOfTwo(int n)
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return (n + 1);
}
static void Main(string[] args)
{
Program p = new Program();
foreach (int n in p.GenerateSequence(1000))
{
Console.WriteLine(n);
}
Console.ReadKey();
}
}
Here is a Python implementation of the Linear Congruential Generator from FryGuy's answer. Because I needed to write it anyway and thought it might be useful for others.
import random
import math
def lcg(start, stop):
N = stop - start
# M is the next largest power of 2
M = int(math.pow(2, math.ceil(math.log(N+1, 2))))
# c is any odd number between 0 and M
c = random.randint(0, M/2 - 1) * 2 + 1
# M=2^m, so make (a-1) divisible by all prime factors and 4
a = random.randint(0, M/4 - 1) * 4 + 1
first = random.randint(0, M - 1)
x = first
while True:
x = (a * x + c) % M
if x < N:
yield start + x
if x == first:
break
if __name__ == "__main__":
for x in lcg(100, 200):
print x,
Sounds like you want an algorithm which is guaranteed to produce a cycle from 0 to n-1 without any repeats. There are almost certainly a whole bunch of these depending on your requirements; group theory would be the most helpful branch of mathematics if you want to delve into the theory behind it.
If you want fast and don't care about predictability/security/statistical patterns, an LCG is probably the simplest approach. The wikipedia page you linked to contains this (fairly simple) set of requirements:
The period of a general LCG is at most
m, and for some choices of a much less
than that. The LCG will have a full
period if and only if:
c and m are relatively prime,
a - 1 is divisible by all prime factors of m
a - 1 is a multiple of 4 if m is a multiple of 4
Alternatively, you could choose a period N >= n, where N is the smallest value that has convenient numerical properties, and just discard any values produced between n and N-1. For example, the lowest N = 2k - 1 >= n would let you use linear feedback shift registers (LFSR). Or find your favorite cryptographic algorithm (RSA, AES, DES, whatever) and given a particular key, figure out the space N of numbers it permutes, and for each step apply encryption once.
If n is small but you want the security to be high, that's probably the trickiest case, as any sequence S is likely to have a period N much higher than n, but is also nontrivial to derive a nonrepeating sequence of numbers with a shorter period than N. (e.g. if you could take the output of S mod n and guarantee nonrepeating sequence of numbers, that would give information about S that an attacker might use)
See my article on secure permutations with block ciphers for one way to do it.
Look into Linear Feedback Shift Registers, they can be used for exactly this.
The short way of explaining them is that you start with a seed and then iterate using the formula
x = (x << 1) | f(x)
where f(x) can only return 0 or 1.
If you choose a good function f, x will cycle through all values between 1 and 2^n-1 (where n is some number), in a good, pseudo-random way.
Example functions can be found here, e.g. for 63 values you can use
f(x) = ((x >> 6) & 1) ^ ((x >> 5) & 1)

Resources