Integer distance - algorithm

As a single operation between two positive integers we understand
multiplying one of the numbers by some prime number or dividing it by
such (provided it can be divided by this prime number without
the remainder). The distance between a and b denoted as d(a,b) is a
minimal amount of operations needed to transform number a into number
b. For example, d(69,42)=3.
Keep in mind that our function d indeed has characteristics of the
distance - for any positive ints a, b and c we get:
a) d(a,a)==0
b) d(a,b)==d(b,a)
c) the inequality of a triangle d(a,b)+d(b,c)>=d(a,c) is fulfilled.
You'll be given a sequence of positive ints a_1, a_2,...,a_n. For every a_i of them
output such a_j (j!=i) that d(a_i, a_j) is as low as possible. For example, the sequence of length 6: {1,2,3,4,5,6} should output {2,1,1,2,1,2}.
This seems really hard to me. What I think would be useful is:
a) if a_i is prime, we are unable to make anything less than a_i (unless it's 1) so the only operation allowed is multiplication. Therefore, if we have 1 in our set, for every prime number d(this_number, 1) is the lowest.
b) also, for 1 d(1, any_prime_number) is the lowest.
c) for a non-prime number we check if we have any of its factors in our set or multiplication of its factors
That's all I can deduce, though. The worst part is I know it will take an eternity for such an algorithm to run and check all the possibilities... Could you please try to help me with it? How should this be done?

Indeed, you can represent any number N as 2^n1 * 3^n2 * 5^n3 * 7^n4 * ... (most of the n's are zeroes).
This way you set a correspondence between a number N and infinite sequence (n1, n2, n3, ...).
Note that your operation is just adding or subtracting 1 at exactly one of the appropriate sequence's places.
Let N and M be two numbers, and their sequences be (n1, n2, n3, ...) and (m1, m2, m3, ...).
The distance between the two numbers is indeed nothing but |n1 - m1| + |n2 - m2| + ...
So, in order to find out the closest number, you need to calculate the sequences for all the input numbers (this is just decomposing them into primes). Having this decomposition, the calculation is straightforward.
Edit:
In fact, you don't need the exact position of your prime factor: you just need to know, which is the exponent for each of the prime divisors.
Edit:
this is the simple procedure for converting the number into the chain representation:
#include <map>
typedef std::map<unsigned int, unsigned int> ChainRepresentation;
// maps prime factor -> exponent, default exponent is of course 0
void convertToListRepresentation(int n, ChainRepresentation& r)
{
// find a divisor
int d = 2;
while (n > 1)
{
for (; n % d; d++)
{
if (n/d < d) // n is prime
{
r[n]++;
return;
}
}
r[d]++;
n /= d;
}
}
Edit:
... and the code for distance:
#include <set>
unsigned int chainDistance(ChainRepresentation& c1, ChainRepresentation& c2)
{
if (&c1 == &c2)
return 0; // protect from modification done by [] during self-comparison
int result = 0;
std::set<unsigned int> visited;
for (ChainRepresentation::const_iterator it = c1.begin(); it != c1.end(); ++it)
{
unsigned int factor = it->first;
unsigned int exponent = it->second;
unsigned int exponent2 = c2[factor];
unsigned int expabsdiff = (exponent > exponent2) ?
exponent - exponent2 : exponent2 - exponent;
result += expabsdiff;
visited.insert(factor);
}
for (ChainRepresentation::const_iterator it = c2.begin(); it != c2.end(); ++it)
{
unsigned int factor = it->first;
if (visited.find(factor) != visited.end())
continue;
unsigned int exponent2 = it->second;
// unsigned int exponent = 0;
result += exponent2;
}
return result;
}

For the given limits: 100_000 numbers not greater than a million the most-straightforward algorithm works (1e10 calls to distance()):
For each number in the sequence print its closest neighbor (as defined by minimal distance):
solution = []
for i, ai in enumerate(numbers):
all_except_i = (aj for j, aj in enumerate(numbers) if j != i)
solution.append(min(all_except_i, key=lambda x: distance(x, ai)))
print(', '.join(map(str, solution)))
Where distance() can be calculated as (see #Vlad's explanation):
def distance(a, b):
"""
a = p1**n1 * p2**n2 * p3**n3 ...
b = p1**m1 * p2**m2 * p3**m3 ...
distance = |m1-n1| + |m2-n2| + |m3-n3| ...
"""
diff = Counter(prime_factors(b))
diff.subtract(prime_factors(a))
return sum(abs(d) for d in diff.values())
Where prime_factors() returns prime factors of a number with corresponding multiplicities {p1: n1, p2: n2, ...}:
uniq_primes_factors = dict(islice(prime_factors_gen(), max(numbers)))
def prime_factors(n):
return dict(multiplicities(n, uniq_primes_factors[n]))
Where multiplicities() function given n and its factors returns them with their corresponding multiplicities (how many times a factor divides the number without a remainder):
def multiplicities(n, factors):
assert n > 0
for prime in factors:
alpha = 0 # multiplicity of `prime` in `n`
q, r = divmod(n, prime)
while r == 0: # `prime` is a factor of `n`
n = q
alpha += 1
q, r = divmod(n, prime)
yield prime, alpha
prime_factors_gen() yields prime factors for each natural number. It uses Sieve of Eratosthenes algorithm to find prime numbers. The implementation is based on gen_primes() function by #Eli Bendersky:
def prime_factors_gen():
"""Yield prime factors for each natural number."""
D = defaultdict(list) # nonprime -> prime factors of `nonprime`
D[1] = [] # `1` has no prime factors
for q in count(1): # Sieve of Eratosthenes algorithm
if q not in D: # `q` is a prime number
D[q + q] = [q]
yield q, [q]
else: # q is a composite
for p in D[q]: # `p` is a factor of `q`: `q == m*p`
# therefore `p` is a factor of `p + q == p + m*p` too
D[p + q].append(p)
yield q, D[q]
del D[q]
See full example in Python.
Output
2, 1, 1, 2, 1, 2

Without bounds on how large your numbers can be and how many numbers can be on the input, we can't really deduce it will take "an eternity" to complete. I am tempted to suggest the most "obvious" solution I can think of
Given the factorization of the numbers it is very easy to find their distance
60 = (2^2)*(3^1)*(5^1)*(7^0)
42 = (2^1)*(3^1)*(5^0)*(7^1)
distance = 3
Calculating this factorization using the naive trial division should take at most O(sqrt(N)) time per number, where N is the number being factorized.
Given the factorizations, you only have O(n^2) combinations to worry about, where n is the ammount of numbers. If you store all the factorizations so that you only compute them once, this step shouldn't take that long unless you have a really large amount of numbers.
You do wonder if there is a faster algorithm though. Perhaps it is possible to do some greatest common divisor trick to avoid computing large factorizations and perhaps we can use some graph algorithms to find the distances in a smarter way.

Haven't really thought this through, but it seems to me that to get from prime A to prime B you multiply A * B and then divide by A.
If you thus break the initial non-prime A & B into their prime factors, factor out the common prime factors, and then use the technique in the first paragraph to convert the unique primes, you should be following a minimal path to get from A to B.

Related

How can we count the number of pairs of coprime integers in an array of integers? (CSES) [duplicate]

Having a sequence of n <= 10^6 integers, all not exceeding m <= 3*10^6, I'd like to count how many coprime pairs are in it. Two numbers are coprime if their greatest common divisor is 1.
It can be done trivially in O(n^2 log n), but this is obviously way to slow, as the limit suggests something closer to O(n log n). One thing than can be done quickly is factoring out all the numbers, and also throwing out multiple occurences of the same prime in each, but that doesn't lead to any significant improvement. I also thought of counting the opposite - pairs that have a common divisor. It could be done in groups - firstly counting all the pairs that their smallest common prime divisor is 2, then 3, 5, and etc., but it seems to me like an other dead end.
I've come up with a slightly faster alternative based on your answer. On my work PC my C++ implementation (bottom) takes about 350ms to solve any problem instance; on my old laptop, it takes just over 1s. This algorithm avoids all division and modulo operations, and uses only O(m) space.
As with your algorithm, the basic idea is to apply the Inclusion-Exclusion Principle by enumerating every number 2 <= i <= m that contains no repeated factors exactly once, and for each such i, counting the number of numbers in the input that are divisible by i and either adding or subtracting this from the total. The key difference is that we can do the counting part "stupidly", simply by testing whether each possible multiple of i appears in the input, and this still takes just O(m log m) time.
How many times does the innermost line c += v[j].freq; in countCoprimes() repeat? The body of the outer loop is executed once for each number 2 <= i <= m that contains no repeated prime factors; this iteration count is trivially upper-bounded by m. The inner loop advances i steps at a time through the range [2..m], so the number of operations it performs during a single outer loop iteration is upper-bounded by m / i. Therefore the total number of iterations of the innermost line is upper-bounded by the sum from i=2 to m of m/i. The m factor can be moved outside the sum to get an upper bound of
m * sum{i=2..m}(1/i)
That sum is a partial sum in a harmonic series, and it is upper-bounded by log(m), so the total number of innermost loop iterations is O(m log m).
extendedEratosthenes() is designed to reduce constant factors by avoiding all divisions and keeping to O(m) memory usage. All countCoprimes() actually needs to know for a number 2 <= i <= m is (a) whether it has repeated prime factors, and if it doesn't, (b) whether it has an even or odd number of prime factors. To calculate (b) we can make use of the fact that the Sieve of Eratosthenes effectively "hits" any given i with its distinct prime factors in increasing order, so we can just flip a bit (the parity field in struct entry) to keep track of whether i has an even or odd number of factors. Each number starts with a prod field equal to 1; to record (a) we simply "knock out" any number that contains the square of a prime number as a factor by setting its prod field to 0. This field serves a dual purpose: if v[i].prod == 0, it indicates that i was discovered to have repeated factors; otherwise it contains the product of the (necessarily distinct) factors discovered so far. The (fairly minor) utility of this is that it allows us to stop the main sieve loop at the square root of m, instead of going all the way up to m: by now, for any given i that has no repeated factors, either v[i].prod == i, in which case we have found all the factors for i, or v[i].prod < i, in which case i must have exactly one factor > sqrt(3000000) that we have not yet accounted for. We can find all such remaining "large factors" with a second, non-nested loop.
#include <iostream>
#include <vector>
using namespace std;
struct entry {
int freq; // Frequency that this number occurs in the input list
int parity; // 0 for even number of factors, 1 for odd number
int prod; // Product of distinct prime factors
};
const int m = 3000000; // Maximum input value
int n = 0; // Will be number of input values
vector<entry> v;
void extendedEratosthenes() {
int i;
for (i = 2; i * i <= m; ++i) {
if (v[i].prod == 1) {
for (int j = i, k = i; j <= m; j += i) {
if (--k) {
v[j].parity ^= 1;
v[j].prod *= i;
} else {
// j has a repeated factor of i: knock it out.
v[j].prod = 0;
k = i;
}
}
}
}
// Fix up numbers with a prime factor above their square root.
for (; i <= m; ++i) {
if (v[i].prod && v[i].prod != i) {
v[i].parity ^= 1;
}
}
}
void readInput() {
int i;
while (cin >> i) {
++v[i].freq;
++n;
}
}
void countCoprimes() {
__int64 total = static_cast<__int64>(n) * (n - 1) / 2;
for (int i = 2; i <= m; ++i) {
if (v[i].prod) {
// i must have no repeated factors.
int c = 0;
for (int j = i; j <= m; j += i) {
c += v[j].freq;
}
total -= (v[i].parity * 2 - 1) * static_cast<__int64>(c) * (c - 1) / 2;
}
}
cerr << "Total number of coprime pairs: " << total << "\n";
}
int main(int argc, char **argv) {
cerr << "Initialising array...\n";
entry initialElem = { 0, 0, 1 };
v.assign(m + 1, initialElem);
cerr << "Performing extended Sieve of Eratosthenes...\n";
extendedEratosthenes();
cerr << "Reading input...\n";
readInput();
cerr << "Counting coprimes...\n";
countCoprimes();
return 0;
}
Further exploiting the ideas I mentioned in my question, I actually managed to come up with a solution myself. As some of you may be interested in it, I will describe it briefly. It does work in O(m log m + n), I've already implemented it in C++ and tested - solves the biggest cases (10^6 integers) in less than 5 seconds.
We have n integers, all not greater than m. We start by doing Eratosthenes Sieve mapping each integer up to m to it's smalles prime factor, allowing us to factor out any number not greater than m in O(log m) time. Then for all given numbers A[i], as long as there is some prime p than divides A[i] in a power greater than one, we divide A[i] by it, because when asking if two numbers are coprime we can omit the exponents. That leaves us with all A[i] being products of distinct primes.
Now, let us assume that we were able to construct in a reasonable time a table T, such that T[i] is number of entries A[j] such that i divides A[j]. This is somehow similar to the approach #Brainless took in his second answer. Constructing table T quickly was the technic I spoke about in the comments below my question.
From now, we will work by Inclusion-Exclusion Principle. Having T, for each i we calculate P[i] - the amount of pairs (j,k) such that A[j] and A[k] are both divisible by i. Then to compute the answer, sum all P[i], taking minus sign before those P[i] for which i has an even number of prime divisors. Note that all prime divisors of i are distinct, because for all other indices i P[i] equals 0. By Inclusion-Exclusion each pair will be counted only once. To see this differently, take a pair A[i] and A[j], assuming that they share exactly k common prime divisors. Then this pair will be counted k times, then discounted kC2 times, counted kC3 times, discounted kC4 times... for nCk see the Newton's Symbol. Some mathematical manipulation makes us see that the considered pair will be counted 1 - (1-1)^k = 1 times, what concludes the proof.
Steps made so far required O(m log log m) for the Sieve and O(m) for computing the result. The last thing to do is to construct array T. We could for every A[i] just increment T[j] for all j dividing i. As A[i] can have at most O(sqrt(A[i])) divisors (and in practice even less than that) then we could construct T in O(n sqrt m). But we can do better than that!
Take two-dimensional array W. At each moment a following invariant holds - if for each non-zero W[i][j] we would increment the counter in table T by W[i][j] for all numbers that divide i, and also share the exact exponents i has in j smallest primes divisors of i, then T would be constructed properly. As this may seem a little confusing, let's see it in action. At start, to make the invariant true, for each A[i] we just increment W[A[i]][0]. Also note that a number not exceeding m can have at most O(log m) prime divisors, so the overall size of W is O(m log m). Now we see that an information stored in W[i][j] can be "pushed forward" in a following way: consider p to be (j+1)-th prime divisor of i, assuming it has one. Then some divisor of i can either have p with an exponent same as in i, or lower. First of these cases is W[i][j+1] - we add another prime that has to be "fully taken" by a divisor. Second case is W[i/p][j] as a divisor of i that doesn't have p with a highest exponent must also divide i/p. And that's it! We consider all i in descending order, then j in ascending order. We "push forward" information from W[i][j]. See that if i has exactly j prime divisors, then the information from it cannot be pushed, but we don't really need that! If i has j prime divisors, then W[i][j] basically says: increment by W[i][j] only index i in array T. So when all the information has been pushed to "last rows" in each W[i] we pass through those rows and finish constructing T. As each cell of W[i][j] has been visited once, this algorithm takes O(m log m) time, and also O(n) at the begining. That concludes the construction. Here's some C++ code from the actual implementation:
FORD(i,SIZE(W)-1,2) //i in descending order
{
int v = i, p;
FOR(j,0,SIZE(W[i])-2) //exclude last row
{
p = S[v]; //j-th divisor; S[v] - smallest prime divisor of v
while (v%p == 0) v /= p;
W[i][j+1] += W[i][j];
W[i/p][j] += W[i][j];
}
T[i] = W[i].back();
}
At the end I'd say that I think array T can be constructed faster and simpler than what I've shown. If anyone has some neat idea about how it could be done, I would appreciate all feedback.
Here's an idea based on the formula for the complete sequence 1..n, found on http://oeis.org/A018805:
a(n) = 2*( Sum phi(j), j=1..n ) - 1, where phi is Euler's totient function
Iterate over the sequence, S. For each term, S_i:
for each of the prime factors, p, of S_i:
if a hash for p does not exist:
create a hash with index p that points to a set of all indexes of S except i,
and a counter set to 1, representing how many terms of S are divisible by p so far
else:
delete i in the existing set of indexes and increment the counter
Sort the hashes for S_i's prime factors by their counters in descending order. Starting with
the largest counter (which means the smallest set), make a list of indexes up to i that are also
members of the next smallest set, until the sets are exhausted. Add the remaining number of
indexes in the list to the cumulative total.
Example:
sum phi' [4,7,10,15,21]
S_0: 4
prime-hash [2:1-4], counters [2:1]
0 indexes up to i in the set for prime 2
total 0
S_1: 7
prime hash [2:1-4; 7:0,2-4], counters [2:1, 7:1]
1 index up to i in the set for prime 7
total 1
S_2: 10
prime hash [2:1,3-4; 5:0-1,3-4; 7:0,2-4], counters [2:2, 5:1, 7:1]
1 index up to i in the set for prime 2, which is also a member
of the set for prime 5
total 2
S_3: 15
prime hash [2:1,3-4; 5:0-1,4; 7:0,2-4; 3:0-2,4], counters [2:2: 5:2, 7:1, 3:1]
2 indexes up to i in the set for prime 5, which are also members
of the set for prime 3
total 4
S_4: 21
prime hash [2:1,3-4; 5:0-1,4; 7:0,2-3; 3:0-2], counters [2:2: 5:2, 7:2, 3:2]
2 indexes up to i in the set for prime 7, which are also members
of the set for prime 3
total 6
6 coprime pairs:
(4,7),(4,15),(4,21),(7,10),(7,15),(10,21)
I would suggest :
1) Use Eratosthene to get a list of sorted prime numbers under 10^6.
2) For each number n in the list, get it's prime factors. Associate it another number f(n) in the following way : let's say that the prime factors of n are 3, 7 and 17. Then the binary representation of f(n) is :
`0 1 0 1 0 0 1`
The first digit (0 here) is associated to the prime number 2, the second (1 here) is associated to the prime number 3, etc ...
Therefore 2 numbers n and m are coprime iff f(n) & f(m) = 0.
3) It's easy to see that there is a N such that for each n : f(n) <= (2^N) - 1. This means that the biggest number f(n) is smaller or equal to a number whose binary representation is :
`1 1 1 1 1 1 1 1 1 1 1 1 1 1 1`
Here N is the number of 1 in the above sequence. Get this N and sort the list of numbers f(n). Let's call this list L.
If you want to optimize: in this list, instead of sorting duplicates, store a pair containing f(n) and the number of times f(n) is duplicated.
4) Iterate from 1 to N in this way : initialize i = 1 0 0 0 0, and at each iteration, move the digit 1 to the right with all other values kept to 0 (implement it using bitshift).
At each iteration, iterate over L to get the number d(i) of elements l in L such that i & l != 0 (be careful if you use the above optimization). In other words, for each i, get the number of elements in L which are not coprimes with i, and name this number d(i). Add the total
D = d(1) + d(2) + ... + d(N)
5) This number D is the number of pairs which are not coprime in the original list. The number of coprime pairs is :
M*(M-1)/2 - D
where M is the number of elements in the original list. The complexity of this method is O(n log(n)).
Good luck !
My previous answer was wrong, apologies. I propose here a modification:
Once you get the prime divisors of each number of the list, associate to each prime number p the number l(p) of numbers in the list which has p as divisor. For example consider the prime number 5, and the list's number which can be divided by 5 are 15, 100 and 255. Then l(5)=3.
To achieve it in O(n logn), iterate over the list and for each number in this list, iterate over it's prime factors; for each prime factor p, increment its l(p).
Then the number of pairs which are not coprime and can be divided by p is
l(p)*(l(p) - 1) / 2
Sum this number for all prime p, and you will get the number of pairs in the list which are not coprime (note that l(p) can be 0). Let say this sum is D, then the answer is
M*(M-1)/2 - D
where M is the length of the list. Good luck !

Seek for a bijective function maps sets to integers

For any two sequences a, b, where a = [a1,a2,...,an] and b = [b1,b2,...,bn] (0<=ai, bi<=m), I want to find an integer function f that f(a) = f(b) if and only if a, b have the same elements, without concerning about their orders.For example, if a = [1,1,2,3], b = [2,1,3,1], c = [3,2,1,3], then f(a) = f(b), f(a) ≠ f(b).
I know there is a naive algorithm which first sort the sequence and then map it to an integer.
For example, after sorting, we have a = [1,1,2,3], b = [1,1,2,3], c = [1,2,3,3], and suppose that m = 9, using decimal conversion, we will finally have f(a) = f(b) = 1123 ≠ f(c) = 1233. But this will take O(nlog(n)) time using some kind of sorting algorithm (Don't use non comparison sorting algorithms).
Is there a better approach ? Something like hash? An O(n) algorithm?
Note that I also need the function easy to be inversed, which means that we can map an integer back to a sequence (or a set, more concisely).
Update: Forgive my poor description. Here both m, n can be very large (1 million or larger). And I also want the upper bound of f to be quite small, preferably O(m^n).
This works for sufficiently small values of m, and sufficiently small array sizes:
#include <stdio.h>
unsigned primes [] = { 2,3,5,7,11,13,17, 19, 23, 29};
unsigned value(unsigned array[], unsigned count);
int main(void)
{
unsigned one[] = { 1,2,2,3,5};
unsigned two[] = { 2,3,1,5,2};
unsigned val1, val2;
val1 = value(one, 5);
val2 = value(two, 5);
fprintf(stdout, "Val1=%u, Val2=%u\n", val1, val2 );
return 0;
}
unsigned value(unsigned array[], unsigned count)
{
unsigned val, idx;
val = 1;
for (idx = 0; idx < count; idx++) {
val *= primes [ array[idx]];
}
return val;
}
For an explanation, see my description here.
Wow, #wildplasser answer is actually super smart. To expand a bit:
Any number can be decomposed in a unique fashion in prime numbers (this is known as the fundamental theorem of arithmetic). His answer relies on that, by building a number for which the input array is a representation of prime factor decomposition. As multiplication is commutative, the exact order of the elements in the array is of no importance, but a given number is associated to one (and only one) sequence of elements.
His solution can be expanded for arbitrary size, e.g. in Python:
import operator
import itertools
import math
class primes(object):
def __init__(self):
self.primes = [2,3,5,7,11]
self.stream = itertools.count(13, 2)
def __getitem__(self, i):
sq = int(math.sqrt(i))
while i >= len(self.primes):
n = self.stream.next()
while any(n % p == 0 for p in self.primes if p <= sq):
n = self.stream.next()
self.primes.append(n)
return self.primes[i]
def prod(itr):
return reduce(operator.mul, itr, 1)
p = primes()
def hash(array):
return prod(p[i] for i in array)
With the expected results:
>>> hash([1,2,2,3,5])
6825
>>> hash([5,3,2,2,1])
6825
Here, 6825 = 3^1 x 5^2 x 7^1 x 13^1, as 3 is the '1' prime (0-indexed), 5 the '2', etc...
>>> 3**1 * 5**2 * 7**1 * 13**1
6825
Building the number itself is O(n) multiplications, as long as the final result remains in the domain of int that you are using (unfortunately, I suspect it might get out of hand quite quickly). Building the series of prime number with an Eratosthenes Sieve as I did is asymptotically O(N * log log N), where N is the m-th largest prime. As asymptotically, N ~ m log m, this gives an overall complexity of O(n + m * log m * loglog (m * log m))
Using a similar approach, instead of taking the prime number decomposition, we can also consider the array to be a representation of the decomposition of a number in a base. To be consistent, this base has to be larger than the larger number of similar elements (e.g. for [5, 3, 3, 2, 1], base must be > 2 because there are two 3). Being on the safe side, you can write:
def hash2(array):
n = len(array)
return sum(n**i for i in array)
>>> hash2([1,5,3,2,2])
8070
>>> hash2([2,1,5,2,3])
8070
You can improve this by computing the largest number of similar elements in the array first, but the hash2 function will be a real hash only when used with the same basis, so the prime number decomposition is probably safe if you work with arrays of varying length and composition, as it will ALWAYS return the same unique integer per bag of numbers.

Why do we check up to the square root of a number to determine if the number is prime?

To test whether a number is prime or not, why do we have to test whether it is divisible only up to the square root of that number?
If a number n is not a prime, it can be factored into two factors a and b:
n = a * b
Now a and b can't be both greater than the square root of n, since then the product a * b would be greater than sqrt(n) * sqrt(n) = n. So in any factorization of n, at least one of the factors must be smaller than the square root of n, and if we can't find any factors less than or equal to the square root, n must be a prime.
Let's say m = sqrt(n) then m × m = n. Now if n is not a prime then n can be written as n = a × b, so m × m = a × b. Notice that m is a real number whereas n, a and b are natural numbers.
Now there can be 3 cases:
a > m ⇒ b < m
a = m ⇒ b = m
a < m ⇒ b > m
In all 3 cases, min(a, b) ≤ m. Hence if we search till m, we are bound to find at least one factor of n, which is enough to show that n is not prime.
Because if a factor is greater than the square root of n, the other factor that would multiply with it to equal n is necessarily less than the square root of n.
Suppose n is not a prime number (greater than 1). So there are numbers a and b such that
n = ab (1 < a <= b < n)
By multiplying the relation a<=b by a and b we get:
a^2 <= ab
ab <= b^2
Therefore: (note that n=ab)
a^2 <= n <= b^2
Hence: (Note that a and b are positive)
a <= sqrt(n) <= b
So if a number (greater than 1) is not prime and we test divisibility up to square root of the number, we will find one of the factors.
It's all really just basic uses of Factorization and Square Roots.
It may appear to be abstract, but in reality it simply lies with the fact that a non-prime-number's maximum possible factorial would have to be its square root because:
sqrroot(n) * sqrroot(n) = n.
Given that, if any whole number above 1 and below or up to sqrroot(n) divides evenly into n, then n cannot be a prime number.
Pseudo-code example:
i = 2;
is_prime = true;
while loop (i <= sqrroot(n))
{
if (n % i == 0)
{
is_prime = false;
exit while;
}
++i;
}
Let's suppose that the given integer N is not prime,
Then N can be factorized into two factors a and b , 2 <= a, b < N such that N = a*b.
Clearly, both of them can't be greater than sqrt(N) simultaneously.
Let us assume without loss of generality that a is smaller.
Now, if you could not find any divisor of N belonging in the range [2, sqrt(N)], what does that mean?
This means that N does not have any divisor in [2, a] as a <= sqrt(N).
Therefore, a = 1 and b = n and hence By definition, N is prime.
...
Further reading if you are not satisfied:
Many different combinations of (a, b) may be possible. Let's say they are:
(a1, b1), (a2, b2), (a3, b3), ..... , (ak, bk). Without loss of generality, assume ai < bi, 1<= i <=k.
Now, to be able to show that N is not prime it is sufficient to show that none of ai can be factorized further. And we also know that ai <= sqrt(N) and thus you need to check till sqrt(N) which will cover all ai. And hence you will be able to conclude whether or not N is prime.
...
So to check whether a number N is Prime or not.
We need to only check if N is divisible by numbers<=SQROOT(N). This is because, if we factor N into any 2 factors say X and Y, ie. N=XY.
Each of X and Y cannot be less than SQROOT(N) because then, XY < N
Each of X and Y cannot be greater than SQROOT(N) because then, X*Y > N
Therefore one factor must be less than or equal to SQROOT(N) ( while the other factor is greater than or equal to SQROOT(N) ).
So to check if N is Prime we need only check those numbers <= SQROOT(N).
Let's say we have a number "a", which is not prime [not prime/composite number means - a number which can be divided evenly by numbers other than 1 or itself. For example, 6 can be divided evenly by 2, or by 3, as well as by 1 or 6].
6 = 1 × 6 or 6 = 2 × 3
So now if "a" is not prime then it can be divided by two other numbers and let's say those numbers are "b" and "c". Which means
a=b*c.
Now if "b" or "c" , any of them is greater than square root of "a "than multiplication of "b" & "c" will be greater than "a".
So, "b" or "c" is always <= square root of "a" to prove the equation "a=b*c".
Because of the above reason, when we test if a number is prime or not, we only check until square root of that number.
Given any number n, then one way to find its factors is to get its square root p:
sqrt(n) = p
Of course, if we multiply p by itself, then we get back n:
p*p = n
It can be re-written as:
a*b = n
Where p = a = b. If a increases, then b decreases to maintain a*b = n. Therefore, p is the upper limit.
Update: I am re-reading this answer again today and it became clearer to me more. The value p does not necessarily mean an integer because if it is, then n would not be a prime. So, p could be a real number (ie, with fractions). And instead of going through the whole range of n, now we only need to go through the whole range of p. The other p is a mirror copy so in effect we halve the range. And then, now I am seeing that we can actually continue re-doing the square root and doing it to p to further half the range.
Let n be non-prime. Therefore, it has at least two integer factors greater than 1. Let f be the smallest of n's such factors. Suppose f > sqrt n. Then n/f is an integer ≤ sqrt n, thus smaller than f. Therefore, f cannot be n's smallest factor. Reductio ad absurdum; n's smallest factor must be ≤ sqrt n.
Any composite number is a product of primes.
Let say n = p1 * p2, where p2 > p1 and they are primes.
If n % p1 === 0 then n is a composite number.
If n % p2 === 0 then guess what n % p1 === 0 as well!
So there is no way that if n % p2 === 0 but n % p1 !== 0 at the same time.
In other words if a composite number n can be divided evenly by
p2,p3...pi (its greater factor) it must be divided by its lowest factor p1 too.
It turns out that the lowest factor p1 <= Math.square(n) is always true.
Yes, as it was properly explained above, it's enough to iterate up to Math.floor of a number's square root to check its primality (because sqrt covers all possible cases of division; and Math.floor, because any integer above sqrt will already be beyond its range).
Here is a runnable JavaScript code snippet that represents a simple implementation of this approach – and its "runtime-friendliness" is good enough for handling pretty big numbers (I tried checking both prime and not prime numbers up to 10**12, i.e. 1 trillion, compared results with the online database of prime numbers and encountered no errors or lags even on my cheap phone):
function isPrime(num) {
if (num % 2 === 0 || num < 3 || !Number.isSafeInteger(num)) {
return num === 2;
} else {
const sqrt = Math.floor(Math.sqrt(num));
for (let i = 3; i <= sqrt; i += 2) {
if (num % i === 0) return false;
}
return true;
}
}
<label for="inp">Enter a number and click "Check!":</label><br>
<input type="number" id="inp"></input>
<button onclick="alert(isPrime(+document.getElementById('inp').value) ? 'Prime' : 'Not prime')" type="button">Check!</button>
To test the primality of a number, n, one would expect a loop such as following in the first place :
bool isPrime = true;
for(int i = 2; i < n; i++){
if(n%i == 0){
isPrime = false;
break;
}
}
What the above loop does is this : for a given 1 < i < n, it checks if n/i is an integer (leaves remainder 0). If there exists an i for which n/i is an integer, then we can be sure that n is not a prime number, at which point the loop terminates. If for no i, n/i is an integer, then n is prime.
As with every algorithm, we ask : Can we do better ?
Let us see what is going on in the above loop.
The sequence of i goes : i = 2, 3, 4, ... , n-1
And the sequence of integer-checks goes : j = n/i, which is n/2, n/3, n/4, ... , n/(n-1)
If for some i = a, n/a is an integer, then n/a = k (integer)
or n = ak, clearly n > k > 1 (if k = 1, then a = n, but i never reaches n; and if k = n, then a = 1, but i starts form 2)
Also, n/k = a, and as stated above, a is a value of i so n > a > 1.
So, a and k are both integers between 1 and n (exclusive). Since, i reaches every integer in that range, at some iteration i = a, and at some other iteration i = k. If the primality test of n fails for min(a,k), it will also fail for max(a,k). So we need to check only one of these two cases, unless min(a,k) = max(a,k) (where two checks reduce to one) i.e., a = k , at which point a*a = n, which implies a = sqrt(n).
In other words, if the primality test of n were to fail for some i >= sqrt(n) (i.e., max(a,k)), then it would also fail for some i <= n (i.e., min(a,k)). So, it would suffice if we run the test for i = 2 to sqrt(n).

Finding the closest integer fraction to a given random real between 0..1, given ranges of numerator and denominator

Given two ranges of positive integers x: [1 ... n] and y: [1 ... m] and random real R from 0 to 1, I need to find the pair of elements (i,j) from x and y such that x_i / y_j is closest to R.
What is the most efficient way to find this pair?
Using Farey sequence
This is a simple and mathematically beautiful algorithm to solve this: run a binary search, where on each iteration the next number is given by the mediant formula (below). By the properties of the Farey sequence that number is the one with the smallest denominator within that interval. Consequently this sequence will always converge and never 'miss' a valid solution.
In pseudocode:
input: m, n, R
a_num = 0, a_denom = 1
b_num = 1, b_denom = 1
repeat:
-- interestingly c_num/c_denom is already in reduced form
c_num = a_num + b_num
c_denom = a_denom + b_denom
-- if the numbers are too big, return the closest of a and b
if c_num > n or c_denom > m then
if R - a_num/a_denom < b_num/b_denom - R then
return a_num, a_denom
else
return b_num, b_denom
-- adjust the interval:
if c_num/c_denom < R then
a_num = c_num, a_denom = c_denom
else
b_num = c_num, b_denom = c_denom
goto repeat
Even though it's fast on average (my educated guess that it's O(log max(m,n))), it can still be slow if R is close to a fraction with a small denominator. For example finding an approximation to 1/1000000 with m = n = 1000000 will take a million iterations.
The standard approach to approximating reals with rationals is computing the continued fraction series (see [1]). Put a limit on the nominator and denominator while computing parts of the series, and the last value before you break the limits is a fraction very close to your real number.
This will find a very good approximation very fast, but I'm not sure this will always find a closest approximation. It is known that
any convergent [partial value of the continued fraction expansion] is nearer to the continued fraction than any other fraction whose denominator is less than that of the convergent
but there may be approximations with larger denominator (still below your limit) that are better approximations, but are not convergents.
[1] http://en.wikipedia.org/wiki/Continued_fraction
Given that R is a real number such that 0 <= R <= 1, integers x: [1 ... n] and integers y: [1 ... m]. It is assumed that n <= m, since if n > m then x[n]/y[m] will be greater than 1, which cannot be the closest approximation to R.
Therefore, the best approximation of R with the denominator d will be either floor(R*d) / d or ceil(R*d) / d.
The problem can be solved in O(m) time and O(1) space (in Python):
from __future__ import division
from random import random
from math import floor
def fractionize(R, n, d):
error = abs(n/d - R)
return (n, d, error) # (numerator, denominator, absolute difference to R)
def better(a, b):
return a if a[2] < b[2] else b
def approximate(R, n, m):
best = (0, 1, R)
for d in xrange(1, m+1):
n1 = min(n, int(floor(R * d)))
n2 = min(n, n1 + 1) # ceil(R*d)
best = better(best, fractionize(R, n1, d))
best = better(best, fractionize(R, n2, d))
return best
if __name__ == '__main__':
def main():
R = random()
n = 30
m = 100
print R, approximate(R, n, m)
main()
Prolly get flamed, but a lookup might be best where we compute all of the fractional values for each of the possible values.. So a simply indexing a 2d array indexed via the fractional parts with the array element containing the real equivalent. I guess we have discrete X and Y parts so this is finite, it wouldnt be the other way around.... Ahh yeah, the actual searching part....erm reet....
Rather than a completely brute force search, do a linear search over the shortest of your lists, using round to find the best match for each element. Maybe something like this:
best_x,best_y=(1,1)
for x in 1...n:
y=max(1,min(m,round(x/R)))
#optional optimization (if you have a fast gcd)
if gcd(x,y)>1:
continue
if abs(R-x/y)<abs(R-bestx/besty):
best_x,best_y=(x,y)
return (best_x,best_y)
Not at all sure whether the gcd "optimization" will ever be faster...
The Solution:
You can do this O(1) space and O(m log(n)) time:
there is no need to create any list to search,
The pseudo code may be is buggy but the idea is this:
r: input number to search.
n,m: the ranges.
for (int i=1;i<=m;i++)
{
minVal = min(Search(i,1,n,r), minVal);
}
//x and y are start and end of array:
decimal Search(i,x,y,r)
{
if (i/x > r)
return i/x - r;
decimal middle1 = i/Cill((x+y)/2);
decimal middle2 = i/Roof((x+y)/2);
decimal dist = min(middle1,middle2)
decimal searchResult = 100000;
if( middle > r)
searchResult = Search (i, x, cill((x+y)/2),r)
else
searchResult = Search(i, roof((x+y)/2), y,r)
if (searchResult < dist)
dist = searchResult;
return dist;
}
finding the index as home work to reader.
Description: I think you can understand what's the idea by code, but let trace one of a for loop:
when i=1:
you should search within bellow numbers:
1,1/2,1/3,1/4,....,1/n
you check the number with (1,1/cill(n/2)) and (1/floor(n/2), 1/n) and doing similar binary search on it to find the smallest one.
Should do this for loop for all items, so it will be done m time. and in each time it takes O(log(n)). this function can improve by some mathematical rules, but It will be complicated, I skip it.
If the denominator of R is larger than m then use the Farey method (which the Fraction.limit_denominator method implements) with a limit of m to get a fraction a/b where b is smaller than m else let a/b = R. With b <= m, either a <= n and you are done or else let M = math.ceil(n/R) and re-run the Farey method.
def approx2(a, b, n, m):
from math import ceil
from fractions import Fraction
R = Fraction(a, b)
if R < Fraction(1, m):
return 1, m
r = R.limit_denominator(m)
if r.numerator > n:
M = ceil(n/R)
r = R.limit_denominator(M)
return r.numerator, r.denominator
>>> approx2(113, 205, 50, 200)
(43, 78)
It might be possible to just run the Farey method once using a limiting denominator of min(ceil(n/R), m) but I am not sure about that:
def approx(a, b, n, m):
from math import ceil
from fractions import Fraction
R = Fraction(a, b)
if R < Fraction(1, m):
return 1, m
r = R.limit_denominator(min(ceil(n/R), m))
return r.numerator, r.denominator

Generating shuffled range using a PRNG rather than shuffling

Is there any known algorithm that can generate a shuffled range [0..n) in linear time and constant space (when output produced iteratively), given an arbitrary seed value?
Assume n may be large, e.g. in the many millions, so a requirement to potentially produce every possible permutation is not required, not least because it's infeasible (the seed value space would need to be huge). This is also the reason for a requirement of constant space. (So, I'm specifically not looking for an array-shuffling algorithm, as that requires that the range is stored in an array of length n, and so would use linear space.)
I'm aware of question 162606, but it doesn't present an answer to this particular question - the mappings from permutation indexes to permutations given in that question would require a huge seed value space.
Ideally, it would act like a LCG with a period and range of n, but the art of selecting a and c for an LCG is subtle. Simply satisfying the constraints for a and c in a full period LCG may satisfy my requirements, but I am wondering if there are any better ideas out there.
Based on Jason's answer, I've made a simple straightforward implementation in C#. Find the next largest power of two greater than N. This makes it trivial to generate a and c, since c needs to be relatively prime (meaning it can't be divisible by 2, aka odd), and (a-1) needs to be divisible by 2, and (a-1) needs to be divisible by 4. Statistically, it should take 1-2 congruences to generate the next number (since 2N >= M >= N).
class Program
{
IEnumerable<int> GenerateSequence(int N)
{
Random r = new Random();
int M = NextLargestPowerOfTwo(N);
int c = r.Next(M / 2) * 2 + 1; // make c any odd number between 0 and M
int a = r.Next(M / 4) * 4 + 1; // M = 2^m, so make (a-1) divisible by all prime factors, and 4
int start = r.Next(M);
int x = start;
do
{
x = (a * x + c) % M;
if (x < N)
yield return x;
} while (x != start);
}
int NextLargestPowerOfTwo(int n)
{
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return (n + 1);
}
static void Main(string[] args)
{
Program p = new Program();
foreach (int n in p.GenerateSequence(1000))
{
Console.WriteLine(n);
}
Console.ReadKey();
}
}
Here is a Python implementation of the Linear Congruential Generator from FryGuy's answer. Because I needed to write it anyway and thought it might be useful for others.
import random
import math
def lcg(start, stop):
N = stop - start
# M is the next largest power of 2
M = int(math.pow(2, math.ceil(math.log(N+1, 2))))
# c is any odd number between 0 and M
c = random.randint(0, M/2 - 1) * 2 + 1
# M=2^m, so make (a-1) divisible by all prime factors and 4
a = random.randint(0, M/4 - 1) * 4 + 1
first = random.randint(0, M - 1)
x = first
while True:
x = (a * x + c) % M
if x < N:
yield start + x
if x == first:
break
if __name__ == "__main__":
for x in lcg(100, 200):
print x,
Sounds like you want an algorithm which is guaranteed to produce a cycle from 0 to n-1 without any repeats. There are almost certainly a whole bunch of these depending on your requirements; group theory would be the most helpful branch of mathematics if you want to delve into the theory behind it.
If you want fast and don't care about predictability/security/statistical patterns, an LCG is probably the simplest approach. The wikipedia page you linked to contains this (fairly simple) set of requirements:
The period of a general LCG is at most
m, and for some choices of a much less
than that. The LCG will have a full
period if and only if:
c and m are relatively prime,
a - 1 is divisible by all prime factors of m
a - 1 is a multiple of 4 if m is a multiple of 4
Alternatively, you could choose a period N >= n, where N is the smallest value that has convenient numerical properties, and just discard any values produced between n and N-1. For example, the lowest N = 2k - 1 >= n would let you use linear feedback shift registers (LFSR). Or find your favorite cryptographic algorithm (RSA, AES, DES, whatever) and given a particular key, figure out the space N of numbers it permutes, and for each step apply encryption once.
If n is small but you want the security to be high, that's probably the trickiest case, as any sequence S is likely to have a period N much higher than n, but is also nontrivial to derive a nonrepeating sequence of numbers with a shorter period than N. (e.g. if you could take the output of S mod n and guarantee nonrepeating sequence of numbers, that would give information about S that an attacker might use)
See my article on secure permutations with block ciphers for one way to do it.
Look into Linear Feedback Shift Registers, they can be used for exactly this.
The short way of explaining them is that you start with a seed and then iterate using the formula
x = (x << 1) | f(x)
where f(x) can only return 0 or 1.
If you choose a good function f, x will cycle through all values between 1 and 2^n-1 (where n is some number), in a good, pseudo-random way.
Example functions can be found here, e.g. for 63 values you can use
f(x) = ((x >> 6) & 1) ^ ((x >> 5) & 1)

Resources