Find most frequent combination of numbers in a set - algorithm

This question is related to the following questions:
How to find most frequent combinations of numbers in a list
Most frequently occurring combinations
My problem is:
Scenario:
I have a set of numbers, EACH COMBINATION IS UNIQUE in this set and each number in the combination appears only once:
Goal:
Find frequency of appears of combination (size of 2) in this set.
Example:
The frequency threshold is 2.
Set = {1,12,13,134,135,235,2345,12345}
The frequency of degree of 2 combination is(show all combinations that appears more than 2 times):
13 - appear 4 times
14 - appear 3 times
23 - appear 3 times
12 - appear 2 times
...
The time complexity of exhaustive searching for all possible combinations grow exponentially.
Can any one help me to think a algorithm that can solve this problem faster? (hash table, XOR, tree search....)
Thank you
PS.
Don't worry about the space complexity
Solution and conclusion:
templatetypedef's answer is good for substring' length more than 3
If substring's length is 2, btilly's answer is straight forward and easy to implement (also have a good performance on time)

Here is pseudo-code whose running time should be O(n * m * m) where n is the size of the set, and m is the size of the things in that set:
let counts be a hash mapping a pair of characters to a count
foreach number N in list:
foreach pair P of characters in N:
if exists counts[P]:
counts[P] = counts[P] + 1
else:
counts[P] = 1
let final be an array of (pair, count)
foreach P in keys of counts:
if 1 < counts[P]:
add (P, counts[P]) to final
sort final according to the counts
output final
#templatetypedef's answer is going to eventually be more efficient if you're looking for combinations of 3, 4, etc characters. But this should be fine for the stated problem.

You can view this problem as a string problem: given a collection of strings, return all substrings of the collection that appear at least k times. Fortunately, there's a polynomial-time algorithm for this problem That uses generalized suffix trees.
Start by constructing a generalized suffix tree for the string representations of your numbers, which takes time linear in the number of digits across all numbers. Then, do a DFS and annotate each node with the number of leaf nodes in its subtree (equivalently, the number of times the string represented by the node appears in the input set), and in the course of doing so output each string discovered this way to appear at least k times. The runtime for this operation is O(d + z), where d is the number of total digits in the input and z is the total number of digits produced as output.
Hope this helps!

Related

Find four factors of a number such that their product is maximum and their sum is the original number

Given number of test cases T and an integer N, you need to find four integers A,B,C,D , such that they're all factors of N(A|N,B|N,C|N,D|N), and N=A+B+C+D. Goal is to maximize A * B * C * D. If it's not possible to find such four factors simply return -1.
Input format for the problem is:
First line contains an integer T(1<=T<=40000), represents the number of test cases.
Each of the next T lines contains an integer N (1<=N<=40000, N^4 will not exceed 64 bit integer).
This question is on Hackerearth under recursion category, but i'm not able to understand the algorithm in the editorial( editorial link:- https://www.hackerearth.com/practice/basic-programming/recursion/recursion-and-backtracking/practice-problems/algorithm/divide-number-a410603f/editorial/).
In the editorial it's been solved using unit fractions but i'm not able to understand the algorithm( I've provided the editorial below if you are not able to open the above link, I'm not able to understand the points marked with ***). Brute force solution results in TLE(Time Limit Exceeded). Please provide algorithm or pseudo-code using DFS or backtracking.
My brute force approach:- calculate the factors of a number 'n' in O(sqrt(n)) and store them in an array, then traverse the array to get A,B,C,D using four for loops. But for T(1<=T<=40000) test cases it gets TLE.
Editorial(If you are not able to open the above link):-
Consider the equation N = A+B+C+D , if we divide the equation by N , we get 1 = 1/A' + 1/B' + 1/C' + 1/D' , here A',B',C',D' are all intergers, because A,B,C,D are factors of N.
So the original problem is equal to divide 1 into four unit fractions.
We can enumerate the unit fractions from large to small.
*** If we need to divide X into Y unit fractions, and the last unit fraction is 1/Z, we can enumerate unit fractions between 1/Z and X/Y(because we are enumerating the largest remaining fraction), and recursively solve.
*** After find all solutions to 1 = 1/A' + 1/B' + 1/C' + 1/D' (about 20 solutions if the numbers are in order), we can enumerate them in each test case. If A',B',C',D' are all factors of N, we can use this solution to update the answer.
Time Complexity: O(T), where T is the number of Test cases.
*** If we need to divide X into Y unit fractions, and the last unit fraction is 1/Z, we can enumerate unit fractions between 1/Z and X/Y(because we are enumerating the largest remaining fraction), and recursively solve.
Answer: We are trying to find out all the combination from 1 = 1/A + 1/B + 1/C + 1/D. Initially, we have X=1 and Y=4, and we are enumerating A as the largest factor, which should be no less than X/Y = 1/4. Because this is the first element, there is no last fraction 1/Z. Suppose we chose A=3, so last fraction 1/Z is 1/A=1/3, and X=1-1/3=2/3, and Y=3. Now we shall choose 1/B from [X/Y, 1/Z] = [2/9, 1/3]. And do the same thing for the next steps.
*** After find all solutions to 1 = 1/A' + 1/B' + 1/C' + 1/D' (about 20 solutions if the numbers are in order), we can enumerate them in each test case. If A',B',C',D' are all factors of N, we can use this solution to update the answer.
Answer: Because 1/A should be no less than 1/4, so A could only be 2, 3, 4. If A==4, then A=B=C=D, only have one solution. If A==3, [X/Y, 1/Z] = [2/9, 1/3], so B could only be 3 or 4, if B ==4, then next round C should be 4 where [X/Y, 1/Z] = [5/24, 1/4]; if B=3, then C could be 4,5,6 because [X/Y, 1/Z]=[1/6,1/3]. If A==2, [X/Y, 1/Z] = [1/6, 1/2], B could be 3,4,5,6. You could do the rest calculation using the code, feel like we could cut off many search branches. (Ignore my enumeration order, you should start from A=2. )
The time complexity of your code can be improved by using only 3 for loop and applying binary search to find the fourth number as time complexity of the binary search is log(n).
Time complexity = O(n^3*(log(n)) and according to the
constraints of question it should able to pass all the test cases.

Can this be properly modeled with segment trees?

The problem I'm working on requires processing several queries on an array (the size of the array is less than 10k, the largest element is certainly less than 10^9).
A query consists of two integers, and one must find the total count of subarrays that have an equal count of these integers. There may be up to 5 * 10^5 queries.
For instance, given the array [1, 2, 1], and the query 1 2 we find that there are two subarrays with equal counts of 1 and 2, namely [1, 2] and [2, 1].
My initial approach was using dynamic programming in order to construct a map, such that memo[i][j] = the number of times the number i appears in the array, until index j. I would use this in a similar way one would use prefix sums, but instead frequencies would accumulate.
Constructing this map took me O(n^2). For each query, I'd do an O(1) processing for each interval and increment the answer. This leads to a complexity of O((q + 1)n * (n - 1) / 2)) [q is the number of queries], which is to say O(n^2), but I also wanted to emphasize that daunting constant factor.
After some rearrangement, I'm trying to find out if there's a way to determine for every subarray the frequency count of each element. I strongly feel this problem is about segment trees and I've struggled with coming up with a proper model and this was the only thing I could think of.
However my approach doesn't seem to be too useful in this case, considering the complexity of combining nodes holding such a great amount of information, not to mention the memory overhead.
How can this be solved efficiently?
Idea 1
You can reduce the time for each query from O(n^2) to O(n) by computing the frequency count of the cumulative count difference:
from collections import defaultdict
def query(A,a,b):
t = 0
freq = defaultdict(int)
freq[0] = 1
for x in A:
if x==a:
t+=1
elif x==b:
t-=1
freq[t] += 1
return sum(count*(count-1)/2 for count in freq.values())
print query([1,2,1],1,2)
The idea is that t represents the total discrepancy between the count of the two elements.
If we find two positions in the array with the same total discrepancy we can conclude that the subarray between these positions must have an equal number.
The expression count*(count-1)/2 simply counts the number of ways of choosing two positions from the count which have the same discrepancy.
Example
For example, suppose we have the array [1,1,1,2,2,2]. The values for the cumulative discrepancy (number of 1's take away number of 2's) will be:
0,1,2,3,2,1,0
Each pair with the same number, corresponds to a subarray with equal count. e.g. looking at the pair of 2s we find that the range from position 2 to position 4 has equal count.
Idea 2
If this is still not fast enough, you could optimize the query function to quickly skip over all elements that are not equal to a or b. For example, you could prepare a list for each element value that contains all the locations of that element.
Once you have this list, you can then instantly jump to the next location of either a or b. For all intermediate values we know the discrepancy will not change, so you can update the frequency by the number of skipped elements (instead of always adding just 1 to the count).

Why is the total number of possible substrings of a string n^2?

I read that the total number of substrings that can be formed from a given string is n^2 but I don't understand how to count this.
By substrings, I mean, given a string CAT, the substrings would be:
C
CA
CAT
A
AT
T
The total number of (nonempty) substrings is n + C(n,2). The leading n counts the number of substrings of length 1 and C(n,2) counts the number of substrings of length > 1 and is equal to the number of ways to choose 2 indices from the set of n. The standard formula for binomial coefficients yields C(n,2) = n*(n-1)/2. Combining these two terms and simplifying gives that the total number is (n^2 + n)/2. #rici in the comments notes that this is the same as C(n+1,2) which makes sense if you e.g. think in terms of Python string slicing where substrings of s can always be written in the form s[i:j] where 0 <= i < j <= n (with j being 1 more than the final index). For n = 3 this works out to (9 + 3)/2 = 6.
In the sense of complexity theory the number of substrings is O(n^2), which might be what you read somewhere.
You have a starting point and and end point - if each could point to anywhere along the word, each would have n possible values, and therefor an overall of n^2, so that's an upper limit.
However, we need a constraint saying that the substring cannot end before it started, so end - start >=0. This cuts the possible count in about half, but on asymptotic terms it's still O(n^2)
Substring calculation is logically
selecting 2 blank spaces atleast one letter apart.
a| b c | d = substring bc
| a b c |d = substring abc.
Now how many ways can you chose these 2 blankspace. For n letter word there are n+1.
Then first select one = n+1 ways
Select another (not the same)= n
So total n(n+1). But you have calculated everything twice. So n*(n+1)/2.
Programmatically, without applying any special algorithms(like Z algo etc) you can use a map to calculate no of distinct substrings.(O(n^3)).
You can use suffix tree to get O(n^2) substring calculaton.
To get a substring of a given string s, you just need to select two different points in the string. Let s contain n characters,
|s[0]|s[1]|...|s[n-1]|
You want to choose two vertical bars to get a substring. How many vertical bars do you have? Exactly n+1. So the number of sustrings is C(n+1,2) = n(n+1)/2, which is to choose 2 items from n+1. Of course, it could be denoted as O(n^2).

Check if array B is a permutation of A

I tried to find a solution to this but couldn't get much out of my head.
We are given two unsorted integer arrays A and B. We have to check whether array B is a permutation of A. How can this be done.? Even XORing the numbers wont work as there can be several counterexamples which have same XOR value bt are not permutation of each other.
A solution needs to be O(n) time and with space O(1)
Any help is welcome!!
Thanks.
The question is theoretical but you can do it in O(n) time and o(1) space. Allocate an array of 232 counters and set them all to zero. This is O(1) step because the array has constant size. Then iterate through the two arrays. For array A, increment the counters corresponding to the integers read. For array B, decrement them. If you run into a negative counter value during iteration of array B, stop --- the arrays are not permutations of each others. Otherwise at the end (assuming A and B have the same size, a prerequisite) the counter array is all zero and the two arrays are permutations of each other.
This is O(1) space and O(n) time solution. However it is not practical, but would easily pass as a solution to the interview question. At least it should.
More obscure solutions
Using a nondeterministic model of computation, checking that the two arrays are not permutations of each others can be done in O(1) space, O(n) time by guessing an element that has differing count on the two arrays, and then counting the instances of that element on both of the arrays.
In randomized model of computation, construct a random commutative hash function and calculate the hash values for the two arrays. If the hash values differ, the arrays are not permutations of each others. Otherwise they might be. Repeat many times to bring the probability of error below desired threshold. Also on O(1) space O(n) time approach, but randomized.
In parallel computation model, let 'n' be the size of the input array. Allocate 'n' threads. Every thread i = 1 .. n reads the ith number from the first array; let that be x. Then the same thread counts the number of occurrences of x in the first array, and then check for the same count on the second array. Every single thread uses O(1) space and O(n) time.
Interpret an integer array [ a1, ..., an ] as polynomial xa1 + xa2 + ... + xan where x is a free variable and the check numerically for the equivalence of the two polynomials obtained. Use floating point arithmetics for O(1) space and O(n) time operation. Not an exact method because of rounding errors and because numerical checking for equivalence is probabilistic. Alternatively, interpret the polynomial over integers modulo a prime number, and perform the same probabilistic check.
If we are allowed to freely access a large list of primes, you can solve this problem by leveraging properties of prime factorization.
For both arrays, calculate the product of Prime[i] for each integer i, where Prime[i] is the ith prime number. The value of the products of the arrays are equal iff they are permutations of one another.
Prime factorization helps here for two reasons.
Multiplication is transitive, and so the ordering of the operands to calculate the product is irrelevant. (Some alluded to the fact that if the arrays were sorted, this problem would be trivial. By multiplying, we are implicitly sorting.)
Prime numbers multiply losslessly. If we are given a number and told it is the product of only prime numbers, we can calculate exactly which prime numbers were fed into it and exactly how many.
Example:
a = 1,1,3,4
b = 4,1,3,1
Product of ith primes in a = 2 * 2 * 5 * 7 = 140
Product of ith primes in b = 7 * 2 * 5 * 2 = 140
That said, we probably aren't allowed access to a list of primes, but this seems a good solution otherwise, so I thought I'd post it.
I apologize for posting this as an answer as it should really be a comment on antti.huima's answer, but I don't have the reputation yet to comment.
The size of the counter array seems to be O(log(n)) as it is dependent on the number of instances of a given value in the input array.
For example, let the input array A be all 1's with a length of (2^32) + 1. This will require a counter of size 33 bits to encode (which, in practice, would double the size of the array, but let's stay with theory). Double the size of A (still all 1 values) and you need 65 bits for each counter, and so on.
This is a very nit-picky argument, but these interview questions tend to be very nit-picky.
If we need not sort this in-place, then the following approach might work:
Create a HashMap, Key as array element, Value as number of occurances. (To handle multiple occurrences of the same number)
Traverse array A.
Insert the array elements in the HashMap.
Next, traverse array B.
Search every element of B in the HashMap. If the corresponding value is 1, delete the entry. Else, decrement the value by 1.
If we are able to process entire array B and the HashMap is empty at that time, Success. else Failure.
HashMap will use constant space and you will traverse each array only once.
Not sure if this is what you are looking for. Let me know if I have missed any constraint about space/time.
You're given two constraints: Computational O(n), where n means the total length of both A and B and memory O(1).
If two series A, B are permutations of each other, then theres also a series C resulting from permutation of either A or B. So the problem is permuting both A and B into series C_A and C_B and compare them.
One such permutation would be sorting. There are several sorting algorithms which work in place, so you can sort A and B in place. Now in a best case scenario Smooth Sort sorts with O(n) computational and O(1) memory complexity, in the worst case with O(n log n) / O(1).
The per element comparision then happens at O(n), but since in O notation O(2*n) = O(n), using a Smooth Sort and comparison will give you a O(n) / O(1) check if two series are permutations of each other. However in the worst case it will be O(n log n)/O(1)
The solution needs to be O(n) time and with space O(1).
This leaves out sorting and the space O(1) requirement is a hint that you probably should make a hash of the strings and compare them.
If you have access to a prime number list do as cheeken's solution.
Note: If the interviewer says you don't have access to a prime number list. Then generate the prime numbers and store them. This is O(1) because the Alphabet length is a constant.
Else here's my alternative idea. I will define the Alphabet as = {a,b,c,d,e} for simplicity.
The values for the letters are defined as:
a, b, c, d, e
1, 2, 4, 8, 16
note: if the interviewer says this is not allowed, then make a lookup table for the Alphabet, this takes O(1) space because the size of the Alphabet is a constant
Define a function which can find the distinct letters in a string.
// set bit value of char c in variable i and return result
distinct(char c, int i) : int
E.g. distinct('a', 0) returns 1
E.g. distinct('a', 1) returns 1
E.g. distinct('b', 1) returns 3
Thus if you iterate the string "aab" the distinct function should give 3 as the result
Define a function which can calculate the sum of the letters in a string.
// return sum of c and i
sum(char c, int i) : int
E.g. sum('a', 0) returns 1
E.g. sum('a', 1) returns 2
E.g. sum('b', 2) returns 4
Thus if you iterate the string "aab" the sum function should give 4 as the result
Define a function which can calculate the length of the letters in a string.
// return length of string s
length(string s) : int
E.g. length("aab") returns 3
Running the methods on two strings and comparing the results takes O(n) running time. Storing the hash values takes O(1) in space.
e.g.
distinct of "aab" => 3
distinct of "aba" => 3
sum of "aab => 4
sum of "aba => 4
length of "aab => 3
length of "aba => 3
Since all the values are equal for both strings, they must be a permutation of each other.
EDIT: The solutions is not correct with the given alphabet values as pointed out in the comments.
You can convert one of the two arrays into an in-place hashtable. This will not be exactly O(N), but it will come close, in non-pathological cases.
Just use [number % N] as it's desired index or in the chain that starts there. If any element has to be replaced, it can be placed at the index where the offending element started. Rinse , wash, repeat.
UPDATE:
This is a similar (N=M) hash table It did use chaining, but it could be downgraded to open addressing.
I'd use a randomized algorithm that has a low chance of error.
The key is to use a universal hash function.
def hash(array, hash_fn):
cur = 0
for item in array:
cur ^= hash_item(item)
return cur
def are_perm(a1, a2):
hash_fn = pick_random_universal_hash_func()
return hash_fn(a1, hash_fn) == hash_fn(a2, hash_fn)
If the arrays are permutations, it will always be right. If they are different, the algorithm might incorrectly say that they are the same, but it will do so with very low probability. Further, you can get an exponential decrease in chance for error with a linear amount of work by asking many are_perm() questions on the same input, if it ever says no, then they are definitely not permutations of each other.
I just find a counterexample. So, the assumption below is incorrect.
I can not prove it, but I think this may be possible true.
Since all elements of the arrays are integers, suppose each array has 2 elements,
and we have
a1 + a2 = s
a1 * a2 = m
b1 + b2 = s
b1 * b2 = m
then {a1, a2} == {b1, b2}
if this is true, it's true for arrays have n-elements.
So we compare the sum and product of each array, if they equal, one is the permutation
of the other.

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.
You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;
If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.
The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)
There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).
With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.
There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.
pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Resources