Programing Pearls - Random Select algorithm

Programing Pearls - Random Select algorithm - algorithm

Page 120 of Programming Pearls 1st edition presents this algorithm for selecting M equally probable random elements out of a population of N integers.
InitToEmpty
Size := 0
While Size < M do
T := RandInt(1,N)
if not Member(T)
Insert(T)
Size := Size + 1
It is stated that the expected number of Member tests is less than 2M, as long as M < N/2.
I'd like to know how to prove it, but my algorithm analysis background is failing me.
I understand that the closer M is to N, the longer the program will take, because the result set will have more elements and the likelihood of RandInt selecting an existing one will increase proportionally.
Can you help me figuring out this proof?

I am not a math wizard, but I will give it a rough shot. This is NOT guaranteed to be right though.
For each additional member of M, you pick a number, see if it's there, and if is add it. Otherwise, you try again. Trying something until you're successful is called a geometric probability distribution.
http://en.wikipedia.org/wiki/Geometric_distribution
So you are running M geometric trials. Each trial has expected value 1/p, so will take expected 1/p tries to get a number not already in M. p is N minus the number of numbers we've already added from M divided by N (i.e. how many unpicked items / total items). So for the fourth number, p = (N -3) / N, which is the probability of picking an unused number, so the expected number of picks for the third number is N / N-3 .
The expected value of the run time is all of these added together. So something like
E(run time) = N/N + N/(N -1) + N/(N -2 ) ... + N/ (N-M)
Now if M < N/2, then the last element in that summation is bounded above by 2. ((N/N/2) == 2)). It's also obviously the largest element in the whole summation. So if the biggest element is two picks, and there are M elements being summed, the EV of the whole run time is bounded above by 2M.
Ask me if any of this is unclear. Correct me if any of this is wrong :)

Say we have chosen K elements out of N. Then our next try has probability (N-K)/N of succeeding, so the number of tries that it takes to find the K + 1 st element is geometrically distributed with mean N/(N-K).
So if 2M < N we expect it to take less than two tries to get each element.

Related

Finding the theoretical bound of local spikes in an array

You are given an arrayA[1..n], which consists of randomly permuted distinct integers.
An element of this array,A[i], is said to be a local spike, if it is larger than all of its preceding elements (in other words, for all j < i,A[i]> A[j]).
Show that the expected number of local spikes in A is O(logn).
If anybody can give me pointers to this question, it would be much appreciated!

It is similar to the reasoning about the quicksort time complexity.
So even though it is more about statistics, it can serve as a nice example of reasoning about algorithm complexity. Maybe it would be more suited to the CS stackexchange than statistics? That being said let's dive into the rabbit hole.
First, since all the numbers are distinct, we can ommit the part about array of random integers and simply take the integers 1, 2, ..., N without a loss of generality.
Now we can change the way of looking at the problem. Instead of having the array we can say that we are choosing a random number from the range 1..N without repetition.
Another observation is, that by choosing a number X, regardless of it being a local spike or not, we are disqualifying all the numbers that are lower from ever being a local spike.
Since we are now choosing the numbers, we can thus discard all Y, where Y < X from the candidate pool. This can be done since regardless of the position for a number lower than the spike, nothing will change for the subsequent spikes. Spike always has to be bigger than the maximum of the previous elements.
So the question becomes how many times can we repeat this procedure of:
Select a number from the pool of candidates as a new spike
Discard all the lower numbers
Before we discard whole candidate pool(starting with the full 1..N range). Not surprisingly, this is almost the same as the expected depth of the quicksort's recursion which is log(n).
A quick explanation if you don't want to check the wiki: Most of the time, we will discard ~half of the candidates. Sometimes less, sometimes more, however in the long run, the half is rather good estimate. More in depth explanation can be found here.

An elegant way to determine the solution to this problem is the following:
Define binary random variables X1, X2, ..., Xn by
Xi = 1 if A[i] is a local spike
Xi = 0 if A[i] is not a local spike
We see that the total number of local spikes is always the sum of the Xi. And we know that
E[X1 + X2 + ... + Xi] = E[X1] + E[X2] + ... + E[Xn]
By the linearity of expectation. So we must now turn out attention to deducing E[Xi] for each i.
Now E[Xi] = P(A[i] is a spike). What is the probability that A[i] > A[j] for all j < i?
This is just the probability that the maximum element of A[1], A[2], ..., A[i] is A[i]. But this maximum element could be located anywhere from A[1] to A[i] with equal probability. So the probability is 1/i that the maximum element is A[i].
So E[Xi] = 1/i. Then we see that
E[total number of spikes] = E[X1] + E[X2] + ... + E[Xn] = 1/1 + 1/2 + ... + 1/n
This is the nth harmonic number, Hn. And it is well known that Hn ~ ln(n). This is because ln(n) <= Hn <= ln(n) + 1 for all n (easy proof involving Riemann sums, but requires a smidge of calculus). This shows that there are O(log n) spikes, on average.

How many times variable m is updated

Given the following pseudo-code, the question is how many times on average is the variable m being updated.
A[1...n]: array with n random elements
m = a[1]
for I = 2 to n do
if a[I] < m then m = a[I]
end for
One might answer that since all elements are random, then the variable will be updated on average on half the number of iterations of the for loop plus one for the initialization.
However, I suspect that there must be a better (and possibly the only correct) way to prove it using binomial distribution with p = 1/2. This way, the average number of updates on m would be
M = 1 + Σi=1 to n-1[k.Cn,k.pk.(1-p)(n-k)]
where Cn,k is the binomial coefficient. I have tried to solve this but I have stuck some steps after since I do not know how to continue.
Could someone explain me which of the two answers is correct and if it is the second one, show me how to calculate M?
Thank you for your time

Assuming the elements of the array are distinct, the expected number of updates of m is the nth harmonic number, Hn, which is the sum of 1/k for k ranging from 1 to n.
The summation formula can also be represented by the recursion:
H1 &equals; 1
Hn &equals; Hn−1&plus;1/n (n > 1)
It's easy to see that the recursion corresponds to the problem.
Consider all permutations of n−1 numbers, and assume that the expected number of assignments is Hn−1. Now, every permutation of n numbers consists of a permutation of n−1 numbers, with a new smallest number inserted in one of n possible insertion points: either at the beginning, or after one of the n−1 existing values. Since it is smaller than every number in the existing series, it will only be assigned to m in the case that it was inserted at the beginning. That has a probability of 1/n, and so the expected number of assignments of a permutation of n numbers is Hn−1 + 1/n.
Since the expected number of assignments for a vector of length one is obviously 1, which is H1, we have an inductive proof of the recursion.
Hn is asymptotically equal to ln n &plus; γ where γ is the Euler-Mascheroni constant, approximately 0.577. So it increases without limit, but quite slowly.
The values for which m is updated are called left-to-right maxima, and you'll probably find more information about them by searching for that term.

I liked #rici answer so I decided to elaborate its central argument a little bit more so to make it clearer to me.
Let H[k] be the expected number of assignments needed to compute the min m of an array of length k, as indicated in the algorithm under consideration. We know that
H[1] = 1.
Now assume we have an array of length n > 1. The min can be in the last position of the array or not. It is in the last position with probability 1/n. It is not with probability 1 - 1/n. In the first case the expected number of assignments is H[n-1] + 1. In the second, H[n-1].
If we multiply the expected number of assignments of each case by their probabilities and sum, we get
H[n] = (H[n-1] + 1)*1/n + H[n-1]*(1 - 1/n)
= H[n-1]*1/n + 1/n + H[n-1] - H[n-1]*1/n
= 1/n + H[n-1]
which shows the recursion.
Note that the argument is valid if the min is either in the last position or in any the first n-1, not in both places. Thus we are using that all the elements of the array are different.

Select K unique random numbers from range with sum equal to S

i have a range
R = {0, ..., N}
and i like to get K elements which have a sum equal to S, but the elements should be selected randomly.
So an easy brute force method would be to determine all element combinations containing K numbers resulting in S and picking one of the combinations by random.
I am trying to think about a recursive solution where a random number is selected and then the problem reduces to find (K-1) random numbers with sum equal to (S - K0) but this need not yield in a solution.
Is there a better approach?
A sample would be:
R = {0,1,2,3,4,5}, S = 5, K = 2
Solutions: randomly pick one of {{1,4};{2,3};{0.5}}

In general, if K is big (then N also), and S not too little, it is unpredictable, because, there are two many combinations.
Brute force: try every combinations. You are sure to find a solution, if there exists one, but if there are more than, say, 1 Md, or somewhat, it it almost impossible to list them all.
Your algorithm:
To choose at random, your algorithm is ok: take one number at random, then another, ...
But you make an assumption: there exists a solution with the numbers you pick: you dont know.
So what ? if statistically there exist many solutions, you could find it like that, perhaps, or perhaps not.
Some trails:
1 Use S/K
If every numbers < S/K, it is impossible.
if every numbers > S/K, it is impossible.
So lets assume that there are numbers < S/K, and other > S/K
2 keep only numbers < S, very interesting if S is little.
3 idea: If S is big, and numbers little, you have chance that there exist many combinations.
idea of algorithm
1 take one number N1 at random
2 if N1 < S/K, take another one N2 > S/K
3 calculate N1+N2: if < 2.S/K take another one N3> S/K, if not
4 iterate at each step: if sum < n S/K take another one > S/K, if not
5 you can have better precision, by replacing S/K by (S-sum N1,N2,...)/(K-n)
If at one step you dont can not find any number, backtrack
hope it helps

I would start with Dirichlet distribution (https://en.wikipedia.org/wiki/Dirichlet_distribution). Using it, you could sample uniformly in (0..1) distributed random numbers Xi, such that SumiXi = 1.
For S <= N, it is easy to see that sampling beyond S is useless and should be rejected outright.
So, combining with acceptance/rejection, something along the lines
Divide interval [0...1] into S (or S+1 if 0 is allowed) equal bins.
Sample K numbers from Dirichlet distribution.
Map sampled numbers to bin index, so you have now sampled integers which are
all below or equal S and have sum equal to S.
If all integers are distinct, accept the sampling, otherwise reject the sampling and go to step 2

Finding even numbers in an array without using feedback

I saw this post: Finding even numbers in an array and I was thinking about how you could do it without feedback. Here's what I mean.
Given an array of length n containing at most e even numbers and a
function isEven that returns true if the input is even and false
otherwise, write a function that prints all the even numbers in the
array using the fewest number of calls to isEven.
The answer on the post was to use a binary search, which is neat since it doesn't mean the array has to be in order. The number of times you have to check if a number is even is e log n instead if n because you do a binary search (log n) to find one even number each time (e times).
But that idea means that you divide the array in half, test for evenness, then decide which half to keep based on the result.
My question is whether or not you can beat n calls on a fixed testing scheme where you check all the numbers you want for evenness without knowing the outcome, and then figure out where the even numbers are after you've done all the tests based on the results. So I guess it's no-feedback or blind or some term like that.
I was thinking about this for a while and couldn't come up with anything. The binary search idea doesn't work at all with this constraint, but maybe something else does? Even getting down to n/2 calls instead of n (yes, I know they are the same big-O) would be good.

The technical term for "no-feedback or blind" is "non-adaptive". O(e log n) calls still suffice, but the algorithm is rather more involved.
Instead of testing the evenness of products, we're going to test the evenness of sums. Let E ≠ F be distinct subsets of {1, …, n}. If we have one array x1, …, xn with even numbers at positions E and another array y1, …, yn with even numbers at positions F, how many subsets J of {1, …, n} satisfy
(∑i in J xi) mod 2 ≠ (∑i in J yi) mod 2?
The answer is 2n-1. Let i be an index such that xi mod 2 ≠ yi mod 2. Let S be a subset of {1, …, i - 1, i + 1, … n}. Either J = S is a solution or J = S union {i} is a solution, but not both.
For every possible outcome E, we need to make calls that eliminate every other possible outcome F. Suppose we make 2e log n calls at random. For each pair E ≠ F, the probability that we still cannot distinguish E from F is (2n-1/2n)2e log n = n-2e, because there are 2n possible calls and only 2n-1 fail to distinguish. There are at most ne + 1 choices of E and thus at most (ne + 1)ne/2 pairs. By a union bound, the probability that there exists some indistinguishable pair is at most n-2e(ne + 1)ne/2 < 1 (assuming we're looking at an interesting case where e ≥ 1 and n ≥ 2), so there exists a sequence of 2e log n calls that does the job.
Note that, while I've used randomness to show that a good sequence of calls exists, the resulting algorithm is deterministic (and, of course, non-adaptive, because we chose that sequence without knowledge of the outcomes).

You can use the Chinese Remainder Theorem to do this. I'm going to change your notation a bit.
Suppose you have N numbers of which at most E are even. Choose a sequence of distinct prime powers q1,q2,...,qk such that their product is at least N^E, i.e.
qi = pi^ei
where pi is prime and ei > 0 is an integer and
q1 * q2 * ... * qk >= N^E
Now make a bunch of 0-1 matrices. Let Mi be the qi x N matrix where the entry in row r and column c has a 1 if c = r mod qi and a 0 otherwise. For example, if qi = 3^2, then row 2 has ones in columns 2, 11, 20, ... 2 + 9j and 0 elsewhere.
Now stack these matrices vertically to get a Q x N matrix M, where Q = q1 + q2 + ... + qk. The rows of M tell you which numbers to multiply together (the nonzero positions). This gives a total of Q products that you need to test for evenness. Call each row a "trial", and say that a "trial involves j" if the jth column of that row is nonempty. The theorem you need is the following:
THEOREM: The number in position j is even if and only if all trials involving j are even.
So you do a total of Q trials and then look at the results. If you choose the prime powers intelligently, then Q should be significantly smaller than N. There are asymptotic results that show you can always get Q on the order of
(2E log N)^2 / 2log(2E log N)
This theorem is actually a corollary of the Chinese Remainder Theorem. The only place that I've seen this used is in Combinatorial Group Testing. Apparently the problem originally arose when testing soldiers coming back from WWII for syphilis.

The problem you are facing is a form of group testing, type of a problem with the objective of reducing the cost of identifying certain elements of a set (up to d elements of a set of N elements).
As you've already stated, there are two basic principles via which the testing may be carried out:
Non-adaptive Group Testing, where all the tests to be performed are decided a priori.
Adaptive Group Testing, where we perform several tests, basing each test on the outcome of previous tests. Obviously, adaptive testing has a potential to reduce the cost, compared to non-adaptive testing.
Theoretical bounds for both principles have been studied, and are available in this Wiki article, or this paper.
For adaptive testing, the upper bound is O(d*log(N)) (as already described in this answer).
For non-adaptive testing, it can be shown that the upper bound is O(d*d/log(d)*log(N)), which is obviously larger than the upper bound for adaptive testing by a factor of d/log(d).
This upper bound for non-adaptive testing comes from an algorithm which uses disjunct matrices: matrices of dimension T x N ("number of tests" x "number of elements"), where each item can be either true (if an element was included in a test), or false (if it wasn't), with a property that any subset of d columns must differ from all other columns by at least a single row (test inclusion). This allows linear time of decoding (there are also "d-separable" matrices where fewer test are needed, but the time complexity for their decoding is exponential and not computationaly feasible).
Conclusion:
My question is whether or not you can beat n calls on a fixed testing scheme [...]
For such a scheme and a sufficiently large value of N, a disjunct matrix can be constructed which would have less than K * [d*d/log(d)*log(N)] rows. So, for large values of N, yes, you can beat it.

The underlying question (challenge) is kind of silly. If the binary search answer is acceptable (where it sums sub arrays and sends them to IsEven) then I can think of a way to do it with E or less calls to IsEven (assuming the numbers are integers of course).
JavaScript to demonstrate
// sort the array by only the first bit of the number
A.sort(function(x,y) { return (x & 1) - (y & 1); });
// all of the evens will be at the beginning
for(var i=0; i < E && i < A.length; i++) {
if(IsEven(A[i]))
Print(A[i]);
else
break;
}

Not exactly a solution, but just few thoughts.
It is easy to see that if a solution exists for array length n that takes less than n tests, then for any array length m > n it is easy to see that there is always a solution with less than m tests. So, if you have a solution for n = 2 or 3 or 4, then the problem is solved.
You can split the array into pairs of numbers and for each pair: if the sum is odd, then exactly one of them is even, otherwise if one of the numbers is even, then both of them are even. This way for each pair it takes either one or two tests. Best case:n/2 tests, worse case:n tests, if even and odd numbers are chosen with equal probability, then: 3n/4 tests.
My hunch is there is no solution with less than n tests. Not sure how to prove it.
UPDATE: The second solution can be extended in the following way.
Check if the sum of two numbers is even. If odd, then exactly one of them is even. Otherwise label the set as "homogeneous set of size 2". Take two "homogenous set"s of same size n. Pick one number from each set and check if their sum is even. If it is even, combine these two sets to a "homogeneous set of size 2n". Otherwise, it implies that one of those sets purely consists of even numbers and the other one purely odd numbers.
Best case:n/2 tests. Average case: 3*n/2. Worst case is still n. Worst case exists only when all the numbers are even or all the numbers are odd.

If we can add and multiply array elements, then we can compute every Boolean function (up to complementation) on the low-order bits. Simulate a circuit that encodes the positions of the even numbers as a number from 0 to nC0 + nC1 + ... + nCe - 1 represented in binary and use calls to isEven to read off the bits.
Number of calls used: within 1 of the information-theoretic optimum.
See also fully homomorphic encryption.

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.

You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;

If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.

The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)

There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).

With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.

There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.

pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio